Abstract:
Generic and specific C-to-E binaural cue coding (BCC) schemes are described, including those in which one or more of the input channels are transmitted as unmodified channels that are not downmixed at the BCC encoder and not upmixed at the BCC decoder. The specific BCC schemes described include 5-to-2, 6-to-5, 7-to-5, 6.1-to-5.1, 7.1-to-5.1, and 6.2-to-5.1, where “0.1” indicates a single low-frequency effects (LFE) channel and “0.2” indicates two LFE channels.
Abstract:
Acoustic echo control and noise suppression is an important part of any “handsfree” telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-channel telecommunication systems have been widely applied. The advantages are very low complexity, high robustness, scalability to multi-channel audio without a need for loudspeaker signal distortion, and efficient integration of echo and noise control in the same algorithm. The proposed method for processing audio signals, comprises the steps of: —receiving an input signal, wherein the input signal is applied to a loudspeaker; —receiving a microphone signal generated by a microphone; —estimating the delay between the loudspeaker and the microphone signals and obtaining a delayed loudspeaker signal, —estimating a coloration correction values of the echo path on the delayed loudspeaker signal, —using information of the delayed loudspeaker signal, microphone signal, and coloration correction values to determine gain filter values, —apply the gain filter values to the microphone signal to remove the echo.
Abstract:
An apparatus for processing an audio signal and method thereof are disclosed, by which a local dynamic range of an audio signal can be adaptively normalized as well as a maximum dynamic range of the audio signal. The present invention includes receiving, by an audio processing apparatus, a signal, and feedback information estimated based on a normalizing gain; generating a noise estimation based on the signal; computing a gain filter for noise canceling, based on the noise estimation and the signal; and, obtaining a restricted gain filter by applying the feedback information to the gain filter.
Abstract:
In one embodiment, C input audio channels are encoded to generate E transmitted audio channel(s), where one or more cue codes are generated for two or more of the C input channels, and the C input channels are downmixed to generate the E transmitted channel(s), where C>E≧1. One or more of the C input channels and the E transmitted channel(s) are analyzed to generate a flag indicating whether or not a decoder of the E transmitted channel(s) should perform envelope shaping during decoding of the E transmitted channel(s). In one implementation, envelope shaping adjusts a temporal envelope of a decoded channel generated by the decoder to substantially match a temporal envelope of a corresponding transmitted channel.
Abstract:
Embodiments of the present invention are directed to a binaural cue coding (BCC) scheme in which an externally provided audio signal (e.g., a studio engineering audio signal) is transmitted, along with derived cue codes, to a receiver instead of an automatically downmixcd audio signal. The cue codes are (adaptively) synchronized with the externally provided audio signal to compensate for time lags (and changes in those time lags) between the externally downmixed audio signal and the multi-channel signal used to generate the cue codes. If the receiver is a legacy receiver, then the studio engineered audio signal will typically provide a higher-quality playback than would be provided by the automatically downmixed audio signal. If the receiver is a BCC-capable receiver, then the synchronization of the cue codes with the externally provided audio signal will typically improve the quality of the synthesized playback.
Abstract:
Surround sound recording is a tedious task requiring the use of many microphones. The invention aims at enabling the use of two-channel microphones (or stereo microphones) for multi-channel surround recording. A conventional stereo microphone, or a two-channel microphone specifically optimized for use with the proposed algorithm, is used to generate two signals. A post-processor is applied to the microphone generated signals to convert them to multi-channel surround.This aim is achieved through a method to generate multiple output audio channels (y1, . . . , yM) from two microphone generated audio channels (x1, x2), in which the number of output channels is equal or higher than two, this method comprising the steps of: determine directions of sound components related to the microphone characteristics determine compensation gains of sound components related to the microphone characteristics generating the output audio channels, y1, . . . , yM, by using the microphone generated audio channels, x1, x2, directions, and compensation gains
Abstract:
A plural-channel audio signal (e.g., a stereo audio) is processed to modify a gain (e.g., a volume or loudness) of a speech component signal (e.g., dialogue spoken by actors in a movie) relative to an ambient component signal (e.g., reflected or reverberated sound) or other component signals. In one aspect, the speech component signal is identified and modified. In one aspect, the speech component signal is identified by assuming that the speech source (e.g., the actor currently speaking) is in the center of a stereo sound image of the plural-channel audio signal and by considering the spectral content of the speech component signal.
Abstract:
The apparatus for constructing a multi-channel output signal using an input signal and parametric side information, the input signal including the first input channel and the second input channel derived from an original multi-channel signal, and the parametric side information describing interrelations between channels of the multi-channel original signal uses base channels for synthesizing first and second output channels on one side of an assumed listener position, which are different from each other. The base channels are different from each other because of a coherence measure. Coherence between the base channels (for example the left and the left surround reconstructed channel) is reduced by calculating a base channel for one of those channels by a combination of the input channels, the combination being determined by the coherence measure. Thus, a high subjective quality of the reconstruction can be obtained because of an approximated original front/back coherence.
Abstract:
A binaural cue coding scheme involving one or more object-based cue codes, wherein an object-based cue code directly represents a characteristic of an auditory scene corresponding to the audio channels, where the characteristic is independent of number and positions of loudspeakers used to create the auditory scene. Examples of object-based cue codes include the angle of an auditory event, the width of the auditory event, the degree of envelopment of the auditory scene, and the directionality of the auditory scene.
Abstract:
A method and apparatus are disclosed for controlling a buffer in a communication system, such as a digital audio broadcasting (DAB) communication system. A more consistent perceptual quality over time provides for a more pleasing auditory experience to a listener. Thus, the disclosed bit allocation process determines, for each frame, a distortion d[k] at which the frame is to be encoded. Generally, the distortion d[k] is determined to minimize (i) the probability for a buffer overflow, and (ii) the variation of perceived distortion over time. A buffer level is controlled by partitioning a signal into a sequence of successive frames; estimating a distortion rate for a number of frames; and selecting a distortion such that the variance of the buffer level is bounded by a specified value. In one implementation, a signal is coded by partitioning the signal into a sequence of successive frames; encoding each frame k for each of a plurality of distortions Di to compute a frame bitrate; estimating an average bitrate Ri[k] for each of said plurality of distortions Di given current and past frame bitrates; interpolating between each of said pair of values for said average bitrate Ri[k] and said plurality of distortions Di to obtain an approximation of a function that maps a distortion to an estimated average bitrate; and encoding each frame with a distortion level determined from said function.
Abstract translation:公开了一种用于控制诸如数字音频广播(DAB)通信系统的通信系统中的缓冲器的方法和装置。 随着时间的推移,更一致的感知质量可以为聆听者提供更愉悦的听觉体验。 因此,所公开的比特分配处理针对每个帧确定要对其进行编码的失真d [k]。 通常,确定失真d [k]使(i)缓冲器溢出的概率最小化,以及(ii)随时间的感知失真的变化。 通过将信号分成连续帧序列来控制缓冲器级; 估计多个帧的失真率; 并选择一个失真,使得缓冲器级别的方差被指定的值限制。 在一个实现中,通过将信号划分为连续帧序列来对信号进行编码; 针对多个失真D i i中的每一个对每个帧k进行编码以计算帧比特率; 针对给定当前和过去帧比特率的所述多个失真D i i中的每一个估计平均比特率R i i [k] 对于所述平均比特率R i i [k]和所述多个失真D i i i的所述一对值中的每一个之间内插,以获得将失真映射到 估计平均比特率; 并且从由所述功能确定的失真水平对每个帧进行编码。