-
公开(公告)号:US20240347066A1
公开(公告)日:2024-10-17
申请号:US18643227
申请日:2024-04-23
发明人: Erik NORVELL , Fredrik JANSSON
IPC分类号: G10L19/012 , G10L19/00 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , H04W76/28
CPC分类号: G10L19/012 , G10L19/0017 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , H04W76/28
摘要: A method, system, and computer program to encode and decode a channel coherence parameter applied on a frequency band basis, where the coherence parameters of each frequency band form a coherence vector. The coherence vector is encoded and decoded using a predictive scheme followed by a variable bit rate entropy coding.
-
公开(公告)号:US20240221765A1
公开(公告)日:2024-07-04
申请号:US18604374
申请日:2024-03-13
发明人: Sascha DISCH , Franz REUTELHUBER , Jan BÜTHE , Markus MULTRUS , Bernd EDLER
IPC分类号: G10L19/02 , G10L19/008 , G10L19/04 , G10L19/16 , G10L21/0232 , G10L21/038 , H04L65/70
CPC分类号: G10L19/02 , G10L21/0232 , G10L21/038 , H04L65/70 , G10L19/008 , G10L19/04 , G10L19/16
摘要: An apparatus for encoding an audio signal includes: a core encoder for core encoding first audio data in a first spectral band; a parametric coder for parametrically coding second audio data in a second spectral band being different from the first spectral band, wherein the parametric coder includes: an analyzer for analyzing first audio data in the first spectral band to obtain a first analysis result and for analyzing second audio data in the second spectral band to obtain a second analysis result; a compensator for calculating a compensation value using the first analysis result and the second analysis result; and a parameter calculated for calculating a parameter from the second audio data in the second spectral band using the compensation value.
-
公开(公告)号:US12014747B2
公开(公告)日:2024-06-18
申请号:US18308293
申请日:2023-04-27
IPC分类号: G10L19/26 , G10L19/02 , G10L19/028 , G10L19/03 , G10L19/032 , G10L19/04 , G10L19/12 , G10L19/16 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L21/038 , G10L25/15 , G10L25/18
CPC分类号: G10L19/265 , G10L19/0204 , G10L19/03 , G10L19/032 , G10L19/12 , G10L19/16 , G10L19/26 , G10L21/007 , G10L21/02 , G10L21/0208 , G10L21/0324 , G10L25/15 , G10L25/18 , G10L19/02 , G10L19/028 , G10L19/04 , G10L21/038
摘要: An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.
-
公开(公告)号:US11978464B2
公开(公告)日:2024-05-07
申请号:US17757122
申请日:2021-01-22
申请人: GOOGLE LLC
IPC分类号: G10L19/00 , G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02
CPC分类号: G10L19/038 , G10L19/04 , G10L21/02 , G06N3/02 , G10L19/00
摘要: A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.
-
公开(公告)号:US11978460B2
公开(公告)日:2024-05-07
申请号:US17817251
申请日:2022-08-03
发明人: Erik Norvell , Fredrik Jansson
IPC分类号: G10L19/012 , G10L19/00 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , H04W76/28
CPC分类号: G10L19/012 , G10L19/0017 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , H04W76/28
摘要: A method, system, and computer program to encode and decode a channel coherence parameter applied on a frequency band basis, where the coherence parameters of each frequency band form a coherence vector. The coherence vector is encoded and decoded using a predictive scheme followed by a variable bit rate entropy coding.
-
公开(公告)号:US11929084B2
公开(公告)日:2024-03-12
申请号:US18158035
申请日:2023-01-23
发明人: Sascha Disch , Martin Dietz , Markus Multrus , Guillaume Fuchs , Emmanuel Ravelli , Matthias Neusinger , Markus Schnell , Benjamin Schubert , Bernhard Grill
IPC分类号: G10L19/18 , G10L19/02 , G10L19/028 , G10L19/032 , G10L19/04 , G10L19/06 , G10L19/24 , G10L19/26 , G10L21/038 , G10L19/20
CPC分类号: G10L19/18 , G10L19/028 , G10L19/032 , G10L19/06 , G10L19/265 , G10L19/02 , G10L19/04 , G10L19/20 , G10L19/24 , G10L21/038
摘要: An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal having a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second portion.
-
公开(公告)号:US20240055008A1
公开(公告)日:2024-02-15
申请号:US18383953
申请日:2023-10-26
IPC分类号: G10L19/012 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , G10L19/00
CPC分类号: G10L19/012 , G10L19/008 , G10L19/032 , G10L19/04 , G10L19/06 , G10L19/0017 , H04W76/28
摘要: A method and a transmitting node for supporting generation of comfort noise for at least two audio channels at a receiving node. The method is performed by a transmitting node. The method comprises determining spectral characteristics of audio signals on at least two input audio channels and determining a spatial coherence between the audio signals. The spatial coherence is associated with perceptual importance measures. A compressed representation of the spatial coherence is determined per frequency band by weighting the spatial coherence within each frequency band according to the perceptual importance measures. Information about the spectral characteristics and the compressed representation of the spatial coherence per frequency band is signaled to the receiving node for enabling the generation of the comfort noise at the receiving node.
-
公开(公告)号:US20230386486A1
公开(公告)日:2023-11-30
申请号:US18248294
申请日:2021-10-15
发明人: Cong ZHOU , Grant A. DAVIDSON , Mark S. VINTON
IPC分类号: G10L19/022 , G10L25/30 , G10L19/032 , G10L19/04
CPC分类号: G10L19/022 , G10L19/04 , G10L19/032 , G10L25/30
摘要: The present invention relates to a method for predicting transform coefficients representing frequency content of an adaptive block length media signal, by receiving a frame and receiving block length information indicating a number of quantized transform coefficients for each block in the frame, the number of quantized transform coefficients being one of a first or second number, wherein the first number is greater than the second number, determining a first block has the second number of quantized transform coefficients, converting the first block into a converted block having the first number of quantized transform coefficients, conditioning a main neural network trained to predict at least one output variable given at least one conditioning variable, the at least one conditioning variable being based on information regarding the converted block and block length information for the first block, providing at least one predicted transform coefficients from an output stage of the main neural network.
-
9.
公开(公告)号:US20230298603A1
公开(公告)日:2023-09-21
申请号:US18150126
申请日:2023-01-04
发明人: In Seon JANG , Seung Kwon BEACK , Jong Mo SUNG , Tae Jin LEE , Woo Taek LIM , Byeong Ho CHO
IPC分类号: G10L19/032 , G10L25/30 , G10L19/04 , G06N7/01
CPC分类号: G10L19/032 , G10L25/30 , G10L19/04 , G06N7/01
摘要: A method for encoding an input signal using N flow blocks (N is a natural number greater than or equal to 2) and (N−1) split block(s), which is performed by a processor, may comprise: transmitting, by a k-th flow block (k is a natural number greater than or equal to 1 and less than or equal to N−1) among the N flow blocks, a k-th transformation signal obtained by transforming a received signal into a latent representation to a k-th split block among the (N−1) split block(s); splitting, by the k-th split block, the k-th transformation signal by a predetermined ratio, into a first split signal and a second split signal; transmitting, by the k-th split block, the first split signal to a (k+1)-th flow block; and quantizing a signal transformed by an N-th flow block and the second split signals using a quantization block.
-
公开(公告)号:US11705137B2
公开(公告)日:2023-07-18
申请号:US16925946
申请日:2020-07-10
申请人: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE , Kwangwoon University Industry-Academic Collaboration Foundation
发明人: Tae Jin Lee , Seung-Kwon Baek , Min Je Kim , Dae Young Jang , Jeongil Seo , Kyeongok Kang , Jin-Woo Hong , Hochong Park , Young-Cheol Park
摘要: Provided is an encoding apparatus for integrally encoding and decoding a speech signal and a audio signal, and may include: an input signal analyzer to analyze a characteristic of an input signal; a stereo encoder to down mix the input signal to a mono signal when the input signal is a stereo signal, and to extract stereo sound image information; a frequency band expander to expand a frequency band of the input signal; a sampling rate converter to convert a sampling rate; a speech signal encoder to encode the input signal using a speech encoding module when the input signal is a speech characteristics signal; a audio signal encoder to encode the input signal using a audio encoding module when the input signal is a audio characteristic signal; and a bitstream generator to generate a bitstream.
-
-
-
-
-
-
-
-
-