Abstract:
One or more attributes (e.g., pan, gain, etc.) associated with one or more objects (e.g., an instrument) of a stereo or multi-channel audio signal can be modified to provide remix capability.
Abstract:
A method and apparatus are disclosed for controlling a buffer in a digital audio broadcasting (DAB) communication system. The decoder buffer level limits are specified in terms of a maximum number of encoded frames (or duration). The transmitter can predict the number of encoded frames, Fpred, in the decoder buffer and transmit the value, Fpred, to the receiver with the audio data. If the transmitter determines that the decoder buffer level is becoming too high, the frames being generated by the encoder are too small and additional bits are allocated to each frame for each of the N programs. Likewise, if the transmitter determines that the decoder buffer level is becoming too low, the frames being generated by the encoder are too big and fewer bits are allocated to each frame for each of the N programs. The transmitted predicted buffer level, Fpred, can also be employed to (i) determine when the decoder should commence decoding frames; and (ii) synchronize the transmitter and the receiver. The receiver fills the decoder buffer until Fpred frames are received before commencing decoding frames when the decoder first starts up or possibly when a new audio program is selected. The transmitter and receiver clocks may be synchronized by adjusting the clock at the receiver by using a feedback loop that compares the actual level of the decoder buffer to the predicted value, Fpred, received from the transmitter (a higher number of encoded frames in the buffer indicates that the clock of the receiver is too slow and should be increased, and a lower number of encoded frames in the buffer indicates that the clock of the receiver is too fast and needs to be slowed down).
Abstract:
A perceptually motivated spatial decomposition for two-channel stereo audio signals, capturing the information about the virtual sound stage, is proposed. The spatial decomposition allows to re-synthesize audio signals for playback over other sound systems than two-channel stereo. With the use of more front loudspeakers, the width of the virtual sound stage can be increased beyond +/−30° and the sweet spot region is extended. Optionally, lateral independent sound components can be played back separately over loudspeakers on the two sides of a listener to increase listener envelopment. It is also explained how the spatial decomposition can be used with surround sound and wavefield synthesis based audio system. According to the main embodiment of the invention applying to multiple audio signals, it is proposed to generate multiple output audio signals (y1 . . . yM) from multiple input audio signals (x1, . . . , xL), in which the number of output is equal or higher than the number of input signals, this method comprising the steps of: —by means of linear combinations of the input subbands X1(i), . . . , XL(i), computing one or more independent sound subbands representing signal components which are independent between the input subbands, —by means of linear combinations of the input subbands X1(i), . . . , XL(i), computing one or more localized direct sound subbands representing signal components which are contained in more than one of the input subbands and direction factors representing the ratios with which these signal components are contained in two or more input subbands, —generating the output subband signals, Y1(i) . . . YM(i), where each output subband signal is a linear combination of the independent sound subbands and the localized direct sound subbands—converting the output subband signals, Y1(i) . . . YM(i), to time domain audio signals, y1 . . . yM.
Abstract:
A method and apparatus are disclosed for controlling a buffer in a digital audio broadcasting (DAB) communication system. An audio encoder marks a frame as “dropped” whenever a buffer overflow might occur. Only a small number of bits are utilized to process a lost frame, thereby preventing the buffer from overflowing and allowing the encoder buffer-level to quickly recover from the potential overflow condition. The audio encoder optionally sets a flag that provides an indication to the receivers that a frame has been lost. If a “frame lost” condition is detected by a receiver, the receiver can optionally employ mitigation techniques to reduce the impact of the lost frame(s).
Abstract:
A decoder (115) generates a multi channel audio signal, such as a surround sound signal, from a received first signal. The multi-channel signal comprises a second set of audio channels and the first signal comprises a first set of audio channels. The decoder (115) comprises a receiver (401) which receives the first signal. The receiver (401) is coupled to an estimate processor (405) which generates estimated parametric data for the second set of audio channels in response to characteristics of the first set of audio channels. The estimated parametric data relates characteristics of the second set of audio channels to characteristics of the first set of audio channels. The decoder (115) furthermore comprises a spatial audio decoder (403) which decodes the first signal in response to the estimated parametric data to generate the multi-channel signal comprising the second set of channels. The invention allows use of spatial audio decoding with signals that are not encoded by a spatial audio encoder.
Abstract:
An auditory scene is synthesized by applying two or more different sets of one or more spatial parameters (e.g., an inter-ear level difference (ILD), inter-ear time difference (ITD), and/or head-related transfer function (HRTF)) to two or more different frequency bands of a combined audio signal, where each different frequency band is treated as if it corresponded to a single audio source in the auditory scene. In one embodiment, the combined audio signal corresponds to the combination of two or more different source signals, where each different frequency band corresponds to a region of the combined audio signal in which one of the source signals dominates the others. In this embodiment, the different sets of spatial parameters are applied to synthesize an auditory scene comprising the different source signals.
Abstract:
In a microphone signal, the signal component corresponding to, e.g., echo is suppressed using an echo control scheme that estimates the spectral envelope of the echo signal, without having to estimate the waveform for the echo signal. In one embodiment, the input signal (to be applied to a loudspeaker) and the microphone signal are spectrally decomposed into multiple subbands, where echo suppression processing is independently performed on each subband. The echo control of the present invention can be implemented with substantially reduced (1) computational complexity and (2) phase sensitivity, as compared to traditional acoustic echo cancellation, in which the waveform for the echo signal is estimated.
Abstract:
For a multi-channel audio signal, parametric coding is applied to different subsets of audio input channels for different frequency regions. For example, for a 5.1 surround sound signal having five regular channels and one low-frequency (LFE) channel, binaural cue coding (BCC) can be applied to all six audio channels for sub-bands at or below a specified cut-off frequency, but to only five audio channels (excluding the LFE channel) for sub-bands above the cut-off frequency. Such frequency-based coding of channels can reduce the encoding and decoding processing loads and/or size of the encoded audio bitstream relative to parametric coding techniques that are applied to all input channels over the entire frequency range.
Abstract:
The apparatus for constructing a multi-channel output signal using an input signal and parametric side information, the input signal including the first input channel and the second input channel derived from an original multi-channel signal, and the parametric side information describing interrelations between channels of the multi-channel original signal uses base channels for synthesizing first and second output channels on one side of an assumed listener position, which are different from each other. The base channels are different from each other because of a coherence measure. Coherence between the base channels (for example the left and the left surround reconstructed channel) is reduced by calculating a base channel for one of those channels by a combination of the input channels, the combination being determined by the coherence measure. Thus, a high subjective quality of the reconstruction can be obtained because of an approximated original front/back coherence.
Abstract:
A method and apparatus are disclosed for representing the masked threshold in a perceptual audio coder, using line spectral frequencies (LSF) or another representation for linear prediction (LP) coefficients. The present invention calculates LP coefficients for the masked threshold using known LPC analysis techniques. In one embodiment, the masked thresholds are optionally transformed to a non-linear frequency scale suitable for auditory properties. The LP coefficients are converted to line spectral frequencies or a similar representation in which they can be quantized for transmission. In one implementation, the masked threshold is transmitted only if the masked threshold is significantly different from the previous masked threshold. In between each transmitted masked threshold, the masked threshold is approximated using interpolation schemes. The present invention decides which masked thresholds to transmit based on the change of consecutive masked thresholds, as opposed to the variation of short-term spectra.