Abstract:
A perceptual audio coder is disclosed for encoding audio signals, such as speech or music, with different spectral and temporal resolutions for the redundancy reduction and irrelevancy reduction using cascaded filterbanks. The disclosed perceptual audio coder includes a first analysis filterbank for performing irrelevancy reduction in accordance with a psychoacoustic model and a second analysis filterbank for performing redundancy reduction. The spectral/temporal resolution of the first filterbank can be optimized for irrelevancy reduction and the spectral/temporal resolution of the second filterbank can be optimized for maximum redundancy reduction. The disclosed perceptual audio coder also includes a scaling block between the cascaded filterbank that scales the spectral coefficients, based on the employed perceptual model.
Abstract:
A method and apparatus are disclosed for representing the masked threshold in a perceptual audio coder, using line spectral frequencies (LSF) or another representation for linear prediction (LP) coefficients. The present invention calculates LP coefficients for the masked threshold using known LPC analysis techniques. In one embodiment, the masked thresholds are optionally transformed to a non-linear frequency scale suitable for auditory properties. The LP coefficients are converted to line spectral frequencies or a similar representation in which they can be quantized for transmission. In one implementation, the masked threshold is transmitted only if the masked threshold is significantly different from the previous masked threshold. In between each transmitted masked threshold, the masked threshold is approximated using interpolation schemes. The present invention decides which masked thresholds to transmit based on the change of consecutive masked thresholds, as opposed to the variation of short-term spectra.
Abstract:
A method of processing an audio signal is disclosed. The present invention includes receiving, by an audio processing apparatus, an input signal; receiving user gain input; generating a linear gain factor and a non-linear gain factor using the user gain input; modifying the non-linear gain factor using absolute threshold of hearing and power of the input signal to generate a modified non-linear gain factor; and, applying the linear gain factor and the modified non-linear gain factor to the audio signal.
Abstract:
A method of processing an audio signal is disclosed. The present invention includes receiving, by an audio processing apparatus, an input signal; estimating indicator function using a signal power of the input signal; obtaining an adapted filter using the indicator function and an equalization filter; and, generating an output signal by applying the adapted filter to the input signal.
Abstract:
A method of processing an audio signal is disclosed. The present invention comprises receiving downmix signal including object signals, transforming the downmix signal per frequency band, determining a direction of an object signal from the transformed downmix signal, and determining blind information by estimating a level of the object signal corresponding to the direction. Accordingly, the present invention generates blind information in case of using an encoder incapable of generating object information, thereby enabling a gain and panning of object to be controlled using the blind information.
Abstract:
A system and method for use in filtering of an acoustic signal are provided for producing an output signal of attenuated amount of diffuse sound in accordance with predetermined parameters of desired output directional response and required attenuation of diffuse sound. The system includes a filtration module and a filter generation module including a directional analysis module and filter construction module.
Abstract:
Surround sound recording is a tedious task requiring the use of many microphones. The invention aims at enabling the use of two-channel microphones (or stereo microphones) for multi-channel surround recording. A conventional stereo microphone, or a two-channel microphone specifically optimized for use with the proposed algorithm, is used to generate two signals. A post-processor is applied to the microphone generated signals to convert them to multi-channel surround.This aim is achieved through a method to generate multiple output audio channels (y1, . . . , yM) from two microphone generated audio channels (x1, x2), in which the number of output channels is equal or higher than two, this method comprising the steps of: determine directions of sound components related to the microphone characteristics determine compensation gains of sound components related to the microphone characteristics generating the output audio channels, y1, . . . , yM, by using the microphone generated audio channels, x1, x2, directions, and compensation gains.
Abstract:
Embodiments of the present invention are directed to a binaural cue coding (BCC) scheme in which an externally provided audio signal (e.g., a studio engineering audio signal) is transmitted, along with derived cue codes, to a receiver instead of an automatically downmixed audio signal. The cue codes are (adaptively) synchronized with the externally provided audio signal to compensate for time lags (and changes in those time lags) between the externally downmixed audio signal and the multi-channel signal used to generate the cue codes. If the receiver is a legacy receiver, then the studio engineered audio signal will typically provide a higher-quality playback than would be provided by the automatically downmixed audio signal. If the receiver is a BCC-capable receiver, then the synchronization of the cue codes with the externally provided audio signal will typically improve the quality of the synthesized playback.
Abstract:
A method and apparatus are disclosed for controlling a buffer in a digital audio broadcasting (DAB) communication system. The decoder buffer level limits are specified in terms of a maximum number of encoded frames (or duration). The transmitter can predict the number of encoded frames, Fpred, in the decoder buffer and transmit the value, Fpred, to the receiver with the audio data. If the transmitter determines that the decoder buffer level is becoming too high, the frames being generated by the encoder are too small and additional bits are allocated to each frame for each of the N programs. Likewise, if the transmitter determines that the decoder buffer level is becoming too low, the frames being generated by the encoder are too big and fewer bits are allocated to each frame for each of the N programs. The transmitted predicted buffer level, Fpred, can also be employed to (i) determine when the decoder should commence decoding frames; and (ii) synchronize the transmitter and the receiver. The receiver fills the decoder buffer until Fpred frames are received before commencing decoding frames when the decoder first starts up or possibly when a new audio program is selected. The transmitter and receiver clocks may be synchronized by adjusting the clock at the receiver by using a feedback loop that compares the actual level of the decoder buffer to the predicted value, Fpred, received from the transmitter (a higher number of encoded frames in the buffer indicates that the clock of the receiver is too slow and should be increased, and a lower number of encoded frames in the buffer indicates that the clock of the receiver is too fast and needs to be slowed down).
Abstract:
Acoustic echo control and noise suppression is an important part of any “handsfree” telecommunication system, such as telephony or audio or video conferencing systems. Bandwidth and computational complexity constraints have prevented that stereo or multi-channel telecommunication systems have been widely applied. The advantages are very low complexity, high robustness, scalability to multi-channel audio without a need for loudspeaker signal distortion, and efficient integration of echo and noise control in the same algorithm. The proposed method for processing audio signals, comprises the steps of: —receiving an input signal, wherein the input signal is applied to a loudspeaker; —receiving a microphone signal generated by a microphone; —estimating the delay between the loudspeaker and the microphone signals and obtaining a delayed loudspeaker signal, —estimating a coloration correction values of the echo path on the delayed loudspeaker signal, —using information of the delayed loudspeaker signal, microphone signal, and coloration correction values to determine gain filter values, —apply the gain filter values to the microphone signal to remove the echo.