METHOD AND APPARATUS FOR DETERMINING PARAMETERS OF A GENERATIVE NEURAL NETWORK

    公开(公告)号:US20230229892A1

    公开(公告)日:2023-07-20

    申请号:US17927929

    申请日:2021-05-31

    CPC classification number: G06N3/0455 G10L19/26 G10L25/30 G10L25/69 G06N3/082

    Abstract: Described herein is a method of determining parameters for a generative neural network for processing an audio signal, wherein the generative neural network includes an encoder stage mapping to a coded feature space and a decoder stage, each stage including a plurality of convolutional layers with one or more weight coefficients, the method comprising a plurality of cycles with sequential processes of: pruning the weight coefficients of either or both stages based on pruning control information, the pruning control information determining the number of weight coefficients that are pruned for respective convolutional layers; training the pruned generative neural network based on a set of training data; determining a loss for the trained and pruned generative neural network based on a loss function; and determining updated pruning control information based on the determined loss and a target loss. Further described are corresponding apparatus, programs, and computer-readable storage media.

    METHOD AND APPARATUS FOR PROCESSING OF AUDIO DATA USING A PRE-CONFIGURED GENERATOR

    公开(公告)号:US20240055006A1

    公开(公告)日:2024-02-15

    申请号:US18256967

    申请日:2021-12-15

    Inventor: Arijit BISWAS

    CPC classification number: G10L19/008 G10L25/30

    Abstract: Described herein is a method for setting up a decoder for generating processed audio data from an audio bitstream, the decoder comprising a Generator of a Generative Adversarial Network, GAN, for processing of the audio data, wherein the method includes the steps of (a) pre-configuring the Generator for processing of audio data with a set of parameters for the Generator, the parameters being determined by training, at training time, the Generator using the full concatenated distribution; and (b) pre-configuring the decoder to determine, at decoding time, a truncation mode for modifying the concatenated distribution and to apply the determined truncation mode to the concatenated distribution. Described are further a method of generating processed audio data from an audio bitstream using a Generator of a Generative Adversarial Network, GAN, for processing of the audio data and a respective apparatus. Moreover, described are also respective systems and computer program products.

    DIALOG ENHANCEMENT COMPLEMENTED WITH FREQUENCY TRANSPOSITION

    公开(公告)号:US20180160236A1

    公开(公告)日:2018-06-07

    申请号:US15567270

    申请日:2016-05-04

    Inventor: Arijit BISWAS

    CPC classification number: H04R25/353 H04R25/505 H04R2225/43

    Abstract: A method, a system and a computer program product are disclosed for enhancing an audio signal in relation to a hearing impairment. An input signal is obtained comprising input sub-band signals in a frequency range comprising a source range and a target range. The input sub-band signals in the source range are selectively transposed into transposed sub-band signals in the target range according to a predefined transposing rule. A masking threshold is determined based on a predefined perceptual model and perceptually relevant sub-band signals of the transposed sub-band signals in the target range exceeding the masking threshold are detected. Input sub-band signals in the target range are selectively replaced with corresponding detected perceptually relevant sub-band signals of the transposed sub-band signals in the target range.

    Decoder-Provided Time Domain Aliasing Cancellation During Lossy/Lossless Transitions

    公开(公告)号:US20190066702A1

    公开(公告)日:2019-02-28

    申请号:US16115795

    申请日:2018-08-29

    Inventor: Arijit BISWAS

    CPC classification number: G10L19/0017 G10L19/02 G10L19/0212 G10L19/18

    Abstract: Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive lossy coded time segments that include audio encoded using frequency-domain lossy coding. The decoder may also receive a lossless stream, which the decoder plays back, that includes audio from the same source encoded using lossless coding. In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream, which may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window. Audio playback of the lossy coded time segments may then be provided, beginning with the aliasing-canceled transition frame.

    Signal Adaptive FIR/IIR Predictors for Minimizing Entropy
    5.
    发明申请
    Signal Adaptive FIR/IIR Predictors for Minimizing Entropy 有权
    用于最小化熵的信号自适应FIR / IIR预测器

    公开(公告)号:US20150317985A1

    公开(公告)日:2015-11-05

    申请号:US14649477

    申请日:2013-12-19

    Inventor: Arijit BISWAS

    CPC classification number: G10L19/0017 G10L19/04

    Abstract: The present document relates to coding. In particular, the present document relates to coding using linear prediction in combination with entropy encoding. A method (600) for determining a general prediction filter for a frame of an input signal (111) is described. The z-transform of the general prediction filter comprises an approximation to the z-transform of a finite impulse response, referred to as FIR, filter with the z variable of the FIR filter being replaced by the z-transform of an allpass filter. The FIR filter comprises a plurality of FIR coefficients (412). The allpass filter exhibits a pole defined by an adjustable pole parameter. The method (600) comprises determining the pole parameter and the plurality of FIR coefficients, such that an entropy of a frame of a prediction error signal (414) which is derived from the frame of the input signal (111) using the general prediction filter defined by the pole parameter and the plurality of FIR coefficients (412) is reduced.

    Abstract translation: 本文件涉及编码。 特别地,本文件涉及使用线性预测与熵编码结合的编码。 描述了一种用于确定输入信号(111)的帧的通用预测滤波器的方法(600)。 通用预测滤波器的z变换包括被称为FIR滤波器的有限脉冲响应的z变换的近似,其中FIR滤波器的z变量由全通滤波器的z变换代替。 FIR滤波器包括多个FIR系数(412)。 全通滤波器具有由可调极点参数定义的极点。 方法(600)包括确定极参数和多个FIR系数,使得使用通用预测滤波器从输入信号(111)的帧导出的预测误差信号(414)的帧的熵 由极参数和多个FIR系数(412)限定。

    CODING DENSE TRANSIENT EVENTS WITH COMPANDING

    公开(公告)号:US20220270624A1

    公开(公告)日:2022-08-25

    申请号:US17270035

    申请日:2019-08-21

    Abstract: Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A method of processing an audio signal includes the following operations. A system receives an audio signal. The system determines that a first frame of the audio signal includes a sparse transient signal. The system determines that a second frame of the audio signal includes a dense transient signal. The system compresses/expands (compands) the audio signal using a companding mle that applies a first companding exponent to the first frame of the audio signal and applies a second companding exponent to the second frame of the audio signal, each companding exponent being used to derive a respective degree of dynamic range compression and expansion for a corresponding frame. The system then provides the companded audio signal to a downstream device.

    Speech/Dialog Enhancement Controlled by Pupillometry

    公开(公告)号:US20210264928A1

    公开(公告)日:2021-08-26

    申请号:US17213088

    申请日:2021-03-25

    Inventor: Arijit BISWAS

    Abstract: The present disclosure relates to methods for processing a decoded audio signal and for selectively applying speech/dialog enhancement to the decoded audio signal. The present disclosure also relates to a method of operating a headset for computer-mediated reality. A method of processing a decoded audio signal comprises obtaining a measure of a cognitive load of a listener that listens to a rendering of the audio signal, determining whether speech/dialog enhancement shall be applied based on the obtained measure of the cognitive load, and performing speech/dialog enhancement based on the determination. A method of operating a headset for computer-mediated reality comprises obtaining eye-tracking data of a wearer of the headset, determining a measure of a cognitive load of the wearer of the headset based on the eye-tracking data, and outputting an indication of the cognitive load of the wearer of the headset. The present disclosure further relates to corresponding apparatus and systems, and to methods of operating such apparatus and systems.

    MDCT-Domain Error Concealment
    8.
    发明申请

    公开(公告)号:US20200013413A1

    公开(公告)日:2020-01-09

    申请号:US16571430

    申请日:2019-09-16

    Abstract: An error-concealing audio decoding method comprises: receiving a packet comprising a set of MDCT coefficients encoding a frame of time-domain samples of an audio signal; identifying the received packet as erroneous; generating estimated MDCT coefficients to replace the set of MDCT coefficients of the erroneous packet, based on corresponding MDCT coefficients associated with a received packet directly preceding the erroneous packet; assigning signs of a first subset of MDCT coefficients of the estimated MDCT coefficients, wherein the first subset comprises such MDCT coefficients that are associated with tonal-like spectral bins, to coincide with signs of corresponding MDCT coefficients of said preceding packet; randomly assigning signs of a second subset of MDCT coefficients of the estimated MDCT coefficients, wherein the second subset comprises MDCT coefficients associated with noise-like spectral bins; replacing the erroneous packet by a concealment packet containing the estimated MDCT coefficients and the signs assigned.

    ESTIMATING A TEMPO METRIC FROM AN AUDIO BIT-STREAM
    9.
    发明申请
    ESTIMATING A TEMPO METRIC FROM AN AUDIO BIT-STREAM 有权
    从音频比特流估计一个TEMPO公制

    公开(公告)号:US20160351177A1

    公开(公告)日:2016-12-01

    申请号:US15118044

    申请日:2015-02-18

    Inventor: Arijit BISWAS

    Abstract: The invention relates to estimating tempo information directly from a bitstream encoding audio information, preferably music. Said tempo information is derived from at least one periodicity derived from a detection of at least two onsets included in the audio information. Such onsets are detected via a detection of long to short block transitions (in the bitstream) or/and via a detection of a changing bit allocation (change of cost) regarding encoding/transmitting the exponents of transform coefficients encoded in the bitstream.

    Abstract translation: 本发明涉及直接从编码音频信息的比特流(优选音乐)估计速度信息。 所述节奏信息是从包括在音频信息中的至少两个开始的检测导出的至少一个周期导出的。 通过检测长到短的块转换(在比特流中)和/或经由对编码/发送在比特流中编码的变换系数的指数的改变的比特分配(成本的变化)的检测来检测这种开始。

    COMPANDING APPARATUS AND METHOD TO REDUCE QUANTIZATION NOISE USING ADVANCED SPECTRAL EXTENSION
    10.
    发明申请
    COMPANDING APPARATUS AND METHOD TO REDUCE QUANTIZATION NOISE USING ADVANCED SPECTRAL EXTENSION 有权
    使用高级光谱扩展降低量化噪声的装置和方法

    公开(公告)号:US20160019908A1

    公开(公告)日:2016-01-21

    申请号:US14762690

    申请日:2014-04-01

    Abstract: Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.

    Abstract translation: 实施例涉及用于减少音频编解码器中的编码噪声的压扩方法和系统。 压缩处理通过压缩处理来降低初始音频信号的原始动态范围,该压缩处理使用定义的窗口形状将初始音频信号分成多个段,使用基于非能量的平均值来计算频域中的宽带增益 初始音频信号的频域样本,并且应用各个增益值来放大相对较低强度的片段并衰减相对较高强度的片段。 压缩的音频信号然后被扩展回到基本上原始的动态范围,该动态范围应用反向增益值来放大相对较高强度的片段,并且衰减相对较低强度的段。 使用QMF滤波器组来分析初始音频信号以获得频域表示。

Patent Agency Ranking