-
公开(公告)号:US12223970B2
公开(公告)日:2025-02-11
申请号:US18103993
申请日:2023-01-31
Inventor: Jongmo Sung , Seung Kwon Beack , Tae Jin Lee , Woo-taek Lim , Inseon Jang , Byeongho Cho
IPC: G10L19/087 , G10L19/038 , G10L19/13 , G10L25/30 , G10L19/02
Abstract: An encoding method, a decoding method, an encoder for performing the encoding method, and a decoder for performing the decoding method are provided. The encoding method includes outputting LP coefficients bitstream and a residual signal by performing an LP analysis on an input signal, outputting a first latent signal obtained by encoding a periodic component of the residual signal, a second latent signal obtained by encoding a non-periodic component of the residual signal, and a weight vector for each of the first latent signal and the second latent signal, using a first neural network module, and outputting a first bitstream obtained by quantizing the first latent signal, a second bitstream obtained by quantizing the second latent signal, and a weight bitstream obtained by quantizing the weight vector, using a quantization module.
-
公开(公告)号:US12223426B2
公开(公告)日:2025-02-11
申请号:US18166407
申请日:2023-02-08
Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE , YONSEI UNIVERSITY WONJU INDUSTRY-ACADEMIC COOPERATION FOUNDATION
Inventor: Jongmo Sung , Seung Kwon Beack , Tae Jin Lee , Woo-taek Lim , Inseon Jang , Byeongho Cho , Young Cheol Park , Joon Byun , Seungmin Shin
IPC: G10L19/00 , G06N3/08 , G10L19/028 , G10L19/038 , G10L25/30 , G10L25/60 , G10L25/69 , G06N3/084 , G10L15/00 , G10L19/22
Abstract: Provided is a method and apparatus for designing and testing an audio codec using quantization based on white noise modeling. A neural network-based audio encoder design method includes generating a quantized latent vector and a reconstructed signal corresponding to an input signal by using a white noise modeling-based quantization process, computing a total loss for training a neural network-based audio codec, based on the input signal, the reconstruction signal, and the quantized latent vector, training the neural network-based audio codec by using the total loss, and validating the trained neural network-based audio codec to select the best neural network-based audio codec.
-
公开(公告)号:US11694703B2
公开(公告)日:2023-07-04
申请号:US17672041
申请日:2022-02-15
Inventor: Woo-taek Lim , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Inseon Jang
Abstract: An audio signal encoding and decoding method using a learning model, a training method of the learning model, and an encoder and decoder that perform the method, are disclosed. The audio signal decoding method may include extracting a first residual signal and a first linear prediction coefficient by decoding a bitstream received from an encoder, generating a first audio signal from the first residual signal using the first linear prediction coefficient, generating a second linear prediction coefficients and a second residual signal from the first audio signal, obtaining a third linear prediction coefficient by inputting the second linear prediction coefficient into a trained learning model, and generating a second audio signal from the second residual signal using the third linear prediction coefficient.
-
4.
公开(公告)号:US11651778B2
公开(公告)日:2023-05-16
申请号:US17520895
申请日:2021-11-08
Inventor: Woo-taek Lim , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Inseon Jang , Jong-Won Seok , Yunsu Kim
IPC: G10L19/16 , G10L19/02 , G10L25/30 , G10L19/038
CPC classification number: G10L19/038 , G10L19/02 , G10L19/167 , G10L25/30
Abstract: Disclosed are methods of encoding and decoding an audio signal, and an encoder and a decoder for performing the methods. The method of encoding an audio signal includes identifying an input signal corresponding to a low frequency band of the audio signal, windowing the input signal, generating a first latent vector by inputting the windowed input signal to a first encoding model, transforming the windowed input signal into a frequency domain, generating a second latent vector by inputting the transformed input signal to a second encoding model, generating a final latent vector by combining the first latent vector and the second latent vector, and generating a bitstream corresponding to the final latent vector.
-
公开(公告)号:US20180144757A1
公开(公告)日:2018-05-24
申请号:US15820852
申请日:2017-11-22
Inventor: Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Young Ho Jeong , Tae Jin Lee , Sang Won Suh
CPC classification number: G10L19/167 , G06F17/2252
Abstract: Disclosed is a bitstream generation method performed by an acoustic data transmission (ADT) encoder, the method including receiving a first audio signal, receiving additional information converted into a bitstream, and transmitting a second audio signal obtained by inserting the bitstream into the first audio signal, to an ADT decoder.
-
公开(公告)号:US11862183B2
公开(公告)日:2024-01-02
申请号:US17368390
申请日:2021-07-06
Inventor: Jongmo Sung , Seung Kwon Beack , Mi Suk Lee , Tae Jin Lee , Woo-taek Lim , Inseon Jang
IPC: G10L19/032
CPC classification number: G10L19/032
Abstract: An audio signal encoding and decoding method using a neural network model, a method of training the neural network model, and an encoder and decoder performing the methods are disclosed. The encoding method includes computing the first feature information of an input signal using a recurrent encoding model, computing an output signal from the first feature information using a recurrent decoding model, calculating a residual signal by subtracting the output signal from the input signal, computing the second feature information of the residual signal using a nonrecurrent encoding model, and converting the first feature information and the second feature information to a bitstream.
-
公开(公告)号:US11790926B2
公开(公告)日:2023-10-17
申请号:US17156006
申请日:2021-01-22
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Mi Suk Lee , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Jin Soo Choi , Minje Kim , Kai Zhen
IPC: G10L19/038 , G10L19/028 , G10L25/18 , G10L25/21 , G10L25/30
CPC classification number: G10L19/038 , G10L19/028 , G10L25/18 , G10L25/21 , G10L25/30
Abstract: A method and apparatus for processing an audio signal are disclosed. According to an example embodiment, a method of processing an audio signal may include acquiring a final audio signal for an initial audio signal using a plurality of neural network models generating output audio signals by encoding and decoding input audio signals, calculating a difference between the initial audio signal and the final audio signal in a time domain, converting the initial audio signal and the final audio signal into Mel-spectra, calculating a difference between the Mel-spectra of the initial audio signal and the final audio signal in a frequency domain, training the plurality of neural network models based on results calculated in the time domain and the frequency domain, and generating a new final audio signal distinguished from the final audio signal from the initial audio signal using the trained neural network models.
-
公开(公告)号:US11664037B2
公开(公告)日:2023-05-30
申请号:US17326035
申请日:2021-05-20
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Woo-taek Lim , Seung Kwon Beack , Jongmo Sung , Mi Suk Lee , Tae Jin Lee , Inseon Jang , Minje Kim , Haici Yang
IPC: G10L19/032 , G10L21/0272
CPC classification number: G10L19/032 , G10L21/0272
Abstract: Methods of encoding and decoding a speech signal using a neural network model that recognizes sound sources, and encoding and decoding apparatuses for performing the methods are provided. A method of encoding a speech signal includes identifying an input signal for a plurality of sound sources; generating a latent signal by encoding the input signal; obtaining a plurality of sound source signals by separating the latent signal for each of the plurality of sound sources; determining a number of bits used for quantization of each of the plurality of sound source signals according to a type of each of the plurality of sound sources; quantizing each of the plurality of sound source signals based on the determined number of bits; and generating a bitstream by combining the plurality of quantized sound source signals.
-
公开(公告)号:US11508386B2
公开(公告)日:2022-11-22
申请号:US16843649
申请日:2020-04-08
Applicant: Electronics and Telecommunications Research Institute , Kwangwoon University Industry-Academic Collaboration Foundation
Inventor: Hochong Park , Seung Kwon Beack , Jongmo Sung , Seong-Hyeon Shin , Mi Suk Lee , Tae Jin Lee , Jin Soo Choi
Abstract: An inventive concept relates to an audio coding method to which CNN-based frequency spectrum recovery is applied. An inventive concept transmits a part of frequency spectral coefficients generated in transform coding to a decoder and the decoder recovers the frequency spectral coefficient not transmitted. Furthermore, the signs of frequency spectral coefficient are transmitted from an encoder to the decoder depending on a sign transmission rule.
-
10.
公开(公告)号:US11837220B2
公开(公告)日:2023-12-05
申请号:US17308800
申请日:2021-05-05
Applicant: Electronics and Telecommunications Research Institute , The Trustees of Indiana University
Inventor: Minje Kim , Mi Suk Lee , Seung Kwon Beack , Jongmo Sung , Tae Jin Lee , Jin Soo Choi , Kai Zhen
Abstract: Disclosed is a speech processing apparatus and method using a densely connected hybrid neural network. The speech processing method includes inputting a time domain sample of N*1 dimension for an input speech into a densely connected hybrid network; passing the time domain sample through a plurality of dense blocks in a densely connected hybrid network; reshaping the time domain samples into M subframes by passing the time domain samples through the plurality of dense blocks; inputting the M subframes into gated recurrent unit (GRU) components of N/M-dimension; outputting clean speech from which noise is removed from the input speech by passing the M subframes through GRU components.
-
-
-
-
-
-
-
-
-