Abstract:
Described herein is a method of determining parameters for a generative neural network for processing an audio signal, wherein the generative neural network includes an encoder stage mapping to a coded feature space and a decoder stage, each stage including a plurality of convolutional layers with one or more weight coefficients, the method comprising a plurality of cycles with sequential processes of: pruning the weight coefficients of either or both stages based on pruning control information, the pruning control information determining the number of weight coefficients that are pruned for respective convolutional layers; training the pruned generative neural network based on a set of training data; determining a loss for the trained and pruned generative neural network based on a loss function; and determining updated pruning control information based on the determined loss and a target loss. Further described are corresponding apparatus, programs, and computer-readable storage media.
Abstract:
Described herein is a method for setting up a decoder for generating processed audio data from an audio bitstream, the decoder comprising a Generator of a Generative Adversarial Network, GAN, for processing of the audio data, wherein the method includes the steps of (a) pre-configuring the Generator for processing of audio data with a set of parameters for the Generator, the parameters being determined by training, at training time, the Generator using the full concatenated distribution; and (b) pre-configuring the decoder to determine, at decoding time, a truncation mode for modifying the concatenated distribution and to apply the determined truncation mode to the concatenated distribution. Described are further a method of generating processed audio data from an audio bitstream using a Generator of a Generative Adversarial Network, GAN, for processing of the audio data and a respective apparatus. Moreover, described are also respective systems and computer program products.
Abstract:
A method, a system and a computer program product are disclosed for enhancing an audio signal in relation to a hearing impairment. An input signal is obtained comprising input sub-band signals in a frequency range comprising a source range and a target range. The input sub-band signals in the source range are selectively transposed into transposed sub-band signals in the target range according to a predefined transposing rule. A masking threshold is determined based on a predefined perceptual model and perceptually relevant sub-band signals of the transposed sub-band signals in the target range exceeding the masking threshold are detected. Input sub-band signals in the target range are selectively replaced with corresponding detected perceptually relevant sub-band signals of the transposed sub-band signals in the target range.
Abstract:
Systems and methods are described for switching between lossy coded time segments and a lossless stream of the same source audio. A decoder may receive lossy coded time segments that include audio encoded using frequency-domain lossy coding. The decoder may also receive a lossless stream, which the decoder plays back, that includes audio from the same source encoded using lossless coding. In response to receiving a determination that network bandwidth is constrained, the decoder may generate an aliasing cancellation component based on a previously-decoded frame of the lossless stream, which may be added to a lossy time segment at a transition frame. The sum of the aliasing cancellation component and the lossy time segment may be normalized using a weight caused by an encoding window. Audio playback of the lossy coded time segments may then be provided, beginning with the aliasing-canceled transition frame.
Abstract:
The present document relates to coding. In particular, the present document relates to coding using linear prediction in combination with entropy encoding. A method (600) for determining a general prediction filter for a frame of an input signal (111) is described. The z-transform of the general prediction filter comprises an approximation to the z-transform of a finite impulse response, referred to as FIR, filter with the z variable of the FIR filter being replaced by the z-transform of an allpass filter. The FIR filter comprises a plurality of FIR coefficients (412). The allpass filter exhibits a pole defined by an adjustable pole parameter. The method (600) comprises determining the pole parameter and the plurality of FIR coefficients, such that an entropy of a frame of a prediction error signal (414) which is derived from the frame of the input signal (111) using the general prediction filter defined by the pole parameter and the plurality of FIR coefficients (412) is reduced.
Abstract:
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A method of processing an audio signal includes the following operations. A system receives an audio signal. The system determines that a first frame of the audio signal includes a sparse transient signal. The system determines that a second frame of the audio signal includes a dense transient signal. The system compresses/expands (compands) the audio signal using a companding mle that applies a first companding exponent to the first frame of the audio signal and applies a second companding exponent to the second frame of the audio signal, each companding exponent being used to derive a respective degree of dynamic range compression and expansion for a corresponding frame. The system then provides the companded audio signal to a downstream device.
Abstract:
The present disclosure relates to methods for processing a decoded audio signal and for selectively applying speech/dialog enhancement to the decoded audio signal. The present disclosure also relates to a method of operating a headset for computer-mediated reality. A method of processing a decoded audio signal comprises obtaining a measure of a cognitive load of a listener that listens to a rendering of the audio signal, determining whether speech/dialog enhancement shall be applied based on the obtained measure of the cognitive load, and performing speech/dialog enhancement based on the determination. A method of operating a headset for computer-mediated reality comprises obtaining eye-tracking data of a wearer of the headset, determining a measure of a cognitive load of the wearer of the headset based on the eye-tracking data, and outputting an indication of the cognitive load of the wearer of the headset. The present disclosure further relates to corresponding apparatus and systems, and to methods of operating such apparatus and systems.
Abstract:
An error-concealing audio decoding method comprises: receiving a packet comprising a set of MDCT coefficients encoding a frame of time-domain samples of an audio signal; identifying the received packet as erroneous; generating estimated MDCT coefficients to replace the set of MDCT coefficients of the erroneous packet, based on corresponding MDCT coefficients associated with a received packet directly preceding the erroneous packet; assigning signs of a first subset of MDCT coefficients of the estimated MDCT coefficients, wherein the first subset comprises such MDCT coefficients that are associated with tonal-like spectral bins, to coincide with signs of corresponding MDCT coefficients of said preceding packet; randomly assigning signs of a second subset of MDCT coefficients of the estimated MDCT coefficients, wherein the second subset comprises MDCT coefficients associated with noise-like spectral bins; replacing the erroneous packet by a concealment packet containing the estimated MDCT coefficients and the signs assigned.
Abstract:
The invention relates to estimating tempo information directly from a bitstream encoding audio information, preferably music. Said tempo information is derived from at least one periodicity derived from a detection of at least two onsets included in the audio information. Such onsets are detected via a detection of long to short block transitions (in the bitstream) or/and via a detection of a changing bit allocation (change of cost) regarding encoding/transmitting the exponents of transform coefficients encoded in the bitstream.
Abstract:
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.