Methods of encoding and decoding speech signal using neural network model recognizing sound sources, and encoding and decoding apparatuses for performing the same

Invention Grant

US11664037B2 Methods of encoding and decoding speech signal using neural network model recognizing sound sources, and encoding and decoding apparatuses for performing the same 有权

Please log in to see more content

Patent Title: Methods of encoding and decoding speech signal using neural network model recognizing sound sources, and encoding and decoding apparatuses for performing the same
Application No.: US17326035

Application Date: 2021-05-20
Publication No.: US11664037B2

Publication Date: 2023-05-30
Inventor: Woo-taek Lim , Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Inseon Jang , Minje Kim , Haici Yang
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Applicant Address: KR IN Daejeon
Assignee: Electronics and Telecommunications Research Institute,The Trustees of Indiana University
Current Assignee: Electronics and Telecommunications Research Institute,The Trustees of Indiana University
Current Assignee Address: KR Daejeon; US IN Indianapolis
Agency: William Park & Associates Ltd.
Priority: KR 20210053581 2021.04.26
Main IPC: G10L19/032
IPC: G10L19/032 ; G10L21/0272

Methods of encoding and decoding speech signal using neural network model recognizing sound sources, and encoding and decoding apparatuses for performing the same

Abstract:

Methods of encoding and decoding a speech signal using a neural network model that recognizes sound sources, and encoding and decoding apparatuses for performing the methods are provided. A method of encoding a speech signal includes identifying an input signal for a plurality of sound sources; generating a latent signal by encoding the input signal; obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources; determining a number of bits used for quantization of each of the plurality of sound source signals according to a type of each of the plurality of sound sources; quantizing each of the plurality of sound source signals based on the determined number of bits; and generating a bitstream by combining the plurality of quantized sound source signals.

Public/Granted literature

US20210366497A1 METHODS OF ENCODING AND DECODING SPEECH SIGNAL USING NEURAL NETWORK MODEL RECOGNIZING SOUND SOURCES, AND ENCODING AND DECODING APPARATUSES FOR PERFORMING THE SAME Public/Granted day:2021-11-25

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L19/00	用于冗余度下降情形（例如在声码器中）的语音或音频信号分析-合成技术；语音或音频信号编码或解码，采用源滤波器模型或心理声学分析（乐器中的入G10H）
G10L19/02	.利用频谱分析，例如变换声码器或子频带声码器
G10L19/032	..频谱分量的量化或非量化