Compressing audio waveforms using neural networks and vector quantizers

Invention Grant

US11600282B2 Compressing audio waveforms using neural networks and vector quantizers 有权

Please log in to see more content

Patent Title: Compressing audio waveforms using neural networks and vector quantizers
Application No.: US17856856

Application Date: 2022-07-01
Publication No.: US11600282B2

Publication Date: 2023-03-07
Inventor: Neil Zeghidour , Marco Tagliasacchi , Dominik Roblek
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Fish & Richardson P.C.
Main IPC: G10L19/038
IPC: G10L19/038 ; G10L25/30 ; G10L19/00 ; G06N3/08 ; G06N3/04

Compressing audio waveforms using neural networks and vector quantizers

Abstract:

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Public/Granted literature

US20230019128A1 COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS Public/Granted day:2023-01-19

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L19/00	用于冗余度下降情形（例如在声码器中）的语音或音频信号分析-合成技术；语音或音频信号编码或解码，采用源滤波器模型或心理声学分析（乐器中的入G10H）
G10L19/02	.利用频谱分析，例如变换声码器或子频带声码器
G10L19/032	..频谱分量的量化或非量化
G10L19/038	...矢量量化，例如TwinVQ音频