Abstract:
A binaural cue coding scheme involving one or more object-based cue codes, wherein an object-based cue code directly represents a characteristic of an auditory scene corresponding to the audio channels, where the characteristic is independent of number and positions of loudspeakers used to create the auditory scene. Examples of object-based cue codes include the angle of an auditory event, the width of the auditory event, the degree of envelopment of the auditory scene, and the directionality of the auditory scene.
Abstract:
A method and apparatus are disclosed for controlling a buffer in a communication system, such as a digital audio broadcasting (DAB) communication system. A more consistent perceptual quality over time provides for a more pleasing auditory experience to a listener. Thus, the disclosed bit allocation process determines, for each frame, a distortion d[k] at which the frame is to be encoded. Generally, the distortion d[k] is determined to minimize (i) the probability for a buffer overflow, and (ii) the variation of perceived distortion over time. A buffer level is controlled by partitioning a signal into a sequence of successive frames; estimating a distortion rate for a number of frames; and selecting a distortion such that the variance of the buffer level is bounded by a specified value. In one implementation, a signal is coded by partitioning the signal into a sequence of successive frames; encoding each frame k for each of a plurality of distortions Di to compute a frame bitrate; estimating an average bitrate Ri[k] for each of said plurality of distortions Di given current and past frame bitrates; interpolating between each of said pair of values for said average bitrate Ri[k] and said plurality of distortions Di to obtain an approximation of a function that maps a distortion to an estimated average bitrate; and encoding each frame with a distortion level determined from said function.
Abstract translation:公开了一种用于控制诸如数字音频广播(DAB)通信系统的通信系统中的缓冲器的方法和装置。 随着时间的推移,更一致的感知质量可以为聆听者提供更愉悦的听觉体验。 因此,所公开的比特分配处理针对每个帧确定要对其进行编码的失真d [k]。 通常,确定失真d [k]使(i)缓冲器溢出的概率最小化,以及(ii)随时间的感知失真的变化。 通过将信号分成连续帧序列来控制缓冲器级; 估计多个帧的失真率; 并选择一个失真,使得缓冲器级别的方差被指定的值限制。 在一个实现中,通过将信号划分为连续帧序列来对信号进行编码; 针对多个失真D i i中的每一个对每个帧k进行编码以计算帧比特率; 针对给定当前和过去帧比特率的所述多个失真D i i中的每一个估计平均比特率R i i [k] 对于所述平均比特率R i i [k]和所述多个失真D i i i的所述一对值中的每一个之间内插,以获得将失真映射到 估计平均比特率; 并且从由所述功能确定的失真水平对每个帧进行编码。
Abstract:
Generic and specific C-to-E binaural cue coding (BCC) schemes are described, including those in which one or more of the input channels are transmitted as unmodified channels that are not downmixed at the BCC encoder and not upmixed at the BCC decoder. The specific BCC schemes described include 5-to-2, 6-to-5, 7-to-5, 6.1-to-5.1, 7.1-to-5.1, and 6.2-to-5.1, where “0.1” indicates a single low-frequency effects (LFE) channel and “0.2” indicates two LFE channels.
Abstract:
A method (and apparatus) for coding an audio signal, the method comprising the steps of partitioning the audio signal into a sequence of successive frames; calculating one or more noise thresholds for each of a plurality of frames in the sequence, each noise threshold for a particular one of the frames corresponding to a different perceptual coding quality for the particular frame; estimating a bit demand for each of a corresponding one or more perceptual coding qualities for each frame, wherein each estimated bit demand comprises a number of bits which would be used to code a given frame at the corresponding perceptual coding quality; selecting one of the perceptual coding qualities for the coding of a particular frame based upon the estimated bit demand for the perceptual coding quality for the particular frame, and further based on one or more bit demands estimated for one or more other frames; and coding the particular frame based on the noise threshold corresponding to the selected perceptual coding quality for the particular frame. In particular, and in accordance with one illustrative embodiment of the present invention, the average bit demand for coding each of a plurality of frames at each of a plurality of different perceptual coding qualities is advantageously estimated, and based on these estimates, each frame is coded so as to maintain a relatively consistent perceptual coding quality from one frame to the next.
Abstract:
Acoustic echo control and noise suppression in telecommunication systems. The proposed method of processing multi-channels audio loudspeakers signals and at least one microphone signal, comprises the steps of: transforming the input microphone signals (y1 (n), y2 (n), . . . , yM (n)) into input microphone short-time spectra, computing a combined loudspeaker signal short-time spectrum [X(i,k)] from the loudspeaker signals, (x1 (n), x2 (n), . . . , xL (n)), computing a combined microphone signal short-time spectrum [Y(i,k)] from the input microphone signal, (y1 (n), y2 (n), . . . , yM (n)), estimating a magnitude or power spectrum of the echo in the combined microphone signal short-time spectrum, computing a gain filter (G(i,k)) for magnitude modification of the input microphone short-time spectra, applying the gain filter to at least one of the input microphone spectra, converting the filtered input microphone spectra into the time domain (e1 (n), e2 (n), . . . , eM (n)).
Abstract:
A preferred embodiment of an apparatus for computing filter coefficients for an adaptive filter for filtering a microphone signal so as to suppress an echo due to a loudspeaker signal includes an extractor for extracting a stationary component signal or a non-stationary component signal from the loudspeaker signal or from a signal derived from the loudspeaker signal, and a computer for computing the filter coefficients for the adaptive filter on the basis of the extracted stationary component signal or the extracted non-stationary component signal.
Abstract:
A binaural cue coding scheme involving one or more object-based cue codes, wherein an object-based cue code directly represents a characteristic of an auditory scene corresponding to the audio channels, where the characteristic is independent of number and positions of loudspeakers used to create the auditory scene. Examples of object-based cue codes include the angle of an auditory event, the width of the auditory event, the degree of envelopment of the auditory scene, and the directionality of the auditory scene.
Abstract:
In one embodiment, C input audio channels are encoded to generate E transmitted audio channel(s), where one or more cue codes are generated for two or more of the C input channels, and the C input channels are downmixed to generate the E transmitted channel(s), where C>E≧1. One or more of the C input channels and the E transmitted channel(s) are analyzed to generate a flag indicating whether or not a decoder of the E transmitted channel(s) should perform envelope shaping during decoding of the E transmitted channel(s). In one implementation, envelope shaping adjusts a temporal envelope of a decoded channel generated by the decoder to substantially match a temporal envelope of a corresponding transmitted channel.
Abstract:
An input audio signal having an input temporal envelope is converted into an output audio signal having an output temporal envelope. The input temporal envelope of the input audio signal is characterized. The input audio signal is processed to generate a processed audio signal, wherein the processing de-correlates the input audio signal. The processed audio signal is adjusted based on the characterized input temporal envelope to generate the output audio signal, wherein the output temporal envelope substantially matches the input temporal envelope.
Abstract:
The purpose of the invention is to bridge the gap between parametric multi-channel audio coding and matrixed-surround multi-channel coding by gradually improving the sound of an up-mix signal while raising the bit-rate consumed by the side-information starting from 0 up to the bit-rates of the parametric methods. More specifically, it provides a method of flexibly choosing an “operating point” somewhere between matrixed-surround (no side-information, limited audio quality) and fully parametric reconstruction (full side-information rate required, good quality). This operating point can be chosen dynamically (i.e. varying over time) and in response to the permissible side-information rate, as it is dictated by the individual application.