摘要:
An encoding apparatus comprises a frame processor (105) which receives a multi channel audio signal comprising at least a first audio signal from a first microphone (101) and a second audio signal from a second microphone (103). An ITD processor 107 then determines an inter time difference between the first audio signal and the second audio signal and a set of delays (109, 111) generates a compensated multi channel audio signal from the multi channel audio signal by delaying at least one of the first and second audio signals in response to the inter time difference signal. A combiner (113) then generates a mono signal by combining channels of the compensated multi channel audio signal and a mono signal encoder (115) encodes the mono signal. The inter time difference may specifically be determined by an algorithm based on determining cross correlations between the first and second audio signals.
摘要:
Provided are an audio encoding method and apparatus and an audio decoding method and apparatus in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal. The audio decoding method includes extracting a downmix signal and object-based side information from an audio signal; generating channel-based side information based on object-based side information and control information for rendering the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multi-channel audio signal using the processed downmix signal and the channel-based side information.
摘要:
Provided are an audio encoding method and apparatus and an audio decoding method and apparatus in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal. The audio decoding method includes extracting a downmix signal and object-based side information from an audio signal; generating channel-based side information based on object-based side information and control information for rendering the downmix signal; processing the downmix signal using a decorrelated channel signal; and generating a multi-channel audio signal using the processed downmix signal and the channel-based side information.
摘要:
A system or method for modeling a signal, such as a speech signal, wherein harmonic frequencies and amplitudes are identified (106) and the harmonic magnitudes are interpolated (110) to obtain spectral magnitudes at a set of fixed frequencies. An inverse transform is applied (112) to the spectral magnitudes to obtain a pseudo auto-correlation sequence, from which linear prediction coefficients are calculated (114). From the linear prediction coefficients, model harmonic magnitudes are generated by sampling the spectral envelope (118) defined by the linear prediction coefficients. A set of scale factors are then calculated (120) as the ratio of the harmonic magnitudes to the model harmonic magnitudes and interpolated to obtain a second set of scale factors (122) at the set of fixed frequencies. The spectral envelope magnitudes at the set of fixed frequencies (124) are multiplied by the second set of scale factors (126) to obtain new spectral magnitudes and the process is iterated to obtain final linear prediction coefficients.
摘要:
A speech coding algorithm interpolates groups speech frames into speech frame pairs, and quantizes each frame of the pair according to a different algorithm. The spectral amplitudes of the second frame are quantized by dividing them into two portions and quantizing one portion and then quantizing a difference between the two portions. The spectral amplitudes of the first frame of the pair are quantized by first converting to a fixed dimension, then interpolating between previous and subsequent frames, then selecting interpolated values in accordance with a mean squared error approach.
摘要:
Speech is encoded by analyzing a digitized speech signal to determine excitation parameters for the digitized speech signal. The digitized speech signal is divided into at least two frequency bands. A nonlinear operation is performed on at least one of the frequency bands to produce a modified frequency band. A determination is made as to whether the modified frequency band is voiced or unvoiced.