摘要:
A method and apparatus for generating frame voicing decisions for an incoming speech signal having periods of active voice and non-active voice for a speech encoder in a speech communications system. A predetermined set of parameters is extracted from the incoming speech signal, including a pitch gain and a pitch lag. A frame voicing decision is made for each frame of the incoming speech signal according to values calculated from the extracted parameters. The predetermined set of parameters further includes a frame full band energy, and a set of spectral parameters called Line Spectral Frequencies (LSF).
摘要:
A method of adjusting an echo canceller comprises obtaining a first cross-correlation between a far-end signal and an error signal, wherein the error signal is generated by subtracting an output signal of an adaptive filter from a local-end signal; determining whether the first cross-correlation is above a pre-determined threshold; relocating the adaptive filter by a few samples if the determining determines that the first cross-correlation is above a pre-determined threshold; calculating a first improvement indicator parameter, wherein the first improvement indicator parameter is calculated after the relocating the adaptive filter by the few samples; determining whether the first improvement indicator parameter indicates a performance improvement by the adaptive filter after the relocating the adaptive filter by the few samples; calculating a gain based on the local-end signal and the error signal if the determining does not determine the performance improvement; and multiplying the adaptive filter by the gain.
摘要:
The invention provides a speech coding system with a music classifier. An encoder is disposed to receive an input signal and provides a bitstream based upon a speech coding of a portion of the input signal. The encoder provides a classification of the input signal as one of noise, speech, and music. The music classifier analyzes or determines signal properties of the input signal. The music classifier compares the signal properties to thresholds to determine the classification of the input signal.
摘要:
There is provided a method for use by an echo canceller to detect an echo path change and adjust to the echo path change. The method comprises determining a first bulk delay using a SPARSE foreground adaptive filter; configuring the foreground adaptive filter to an open-loop mode; canceling the echo signal based on the first bulk delay using the foreground adaptive filter; determining a second bulk delay of the echo signal using a SPARSE background adaptive filter; configuring the foreground adaptive filter to a closed-loop mode and continuing to cancel the echo signal based on the first bulk delay; configuring the background adaptive filter to the open-loop mode; measuring echo cancellation performance of the foreground adaptive filter and the background adaptive filter; and changing parameters of the foreground adaptive filter if the echo cancellation performance of the background adaptive filter is better than the foreground adaptive filter.
摘要:
In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.
摘要:
A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not. If the speech is periodic, the improved speech encoder obtains the pitch gains from the unquantized weighted speech signal and performs a pre-vector quantization of the adaptive codebook gain GP for each subframe of the frame before subframe processing begins and a closed-loop delayed decision vector quantization of the fixed codebook gain GC. If the frame of speech is non-periodic, the speech encoder may use any known method of gain quantization. The result of quantizing gains of periodic speech in this manner results in a reduction of the number of bits required to represent the quantized gain information and for periodic speech, the ability to use the quantized pitch gain for the current subframe to search the fixed codebook for the fixed codebook excitation vector for the current subframe. Alternatively, the new gain quantization process which was used only for periodic signals may be extended to non-periodic signals as well. This second strategy results in a slightly higher bit rate than that for periodic signals that use the new gain quantization strategy, but is still lower than the prior art's bit rate. Yet another alternative is to use the new gain quantization process for all speech signals without distinguishing between periodic and non-periodic signals.
摘要:
An extended signal coding system that accommodates substantially music-like signals within a signal while maintaining a high perceptual quality in a reproduced signal during discontinued transmission (DTX) operation. The extended signal coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the signal, to ensure the high perceptual quality in the reproduced signal. In certain embodiments of the invention, the signal is a speech signal, and the speech signal has a substantially music-like signal contained therein, and the extended signal coding system overrides any voice activity detection (VAD) decision that is used to determine which among a plurality of source coding modes are to be employed using a voice activity detection (VAD) correction/supervision circuitry. This is particularly relevant for discontinued transmission (DTX) operation. In certain embodiments of the invention, a signal coding circuitry maintains an improved perceptual quality in a coded signal having a substantially music-like component. This assurance of an improved perceptual quality is very desirable when there is a presence of a music-like signal in an un-coded signal.
摘要:
A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not. If the speech is periodic, the improved speech encoder obtains the pitch gains from the unquantized weighted speech signal and performs a pre-vector quantization of the adaptive codebook gain GP for each subframe of the frame before subframe processing begins and a closed-loop delayed decision vector quantization of the fixed codebook gain GC. If the frame of speech is non-periodic, the speech encoder may use any known method of gain quantization. The result of quantizing gains of periodic speech in this manner results in a reduction of the number of bits required to represent the quantized gain information and for periodic speech, the ability to use the quantized pitch gain for the current subframe to search the fixed codebook for the fixed codebook excitation vector for the current subframe. Alternatively, the new gain quantization process which was used only for periodic signals may be extended to non-periodic signals as well. This second strategy results in a slightly higher bit rate than that for periodic signals that use the new gain quantization strategy, but is still lower than the prior art's bit rate. Yet another alternative is to use the new gain quantization process for all speech signals without distinguishing between periodic and non-periodic signals.
摘要:
There is provided a method of detecting an infinite echo return loss (ERL) in an echo cancellation system while in a finite ERL mode. The method comprises determining a running mean attenuation by the echo cancellation system, determining a ratio of an echo signal to a near-end noise ratio (ENR), defining an infinite ERL threshold (THinfinite) as a function of the ENR, and switching to an infinite ERL mode as a function of the running mean attenuation and the THinfinite. The running mean attenuation can be enhanced echo return loss (ERLE), and the higher the ENR the higher the THinfinite and the lower the ENR the lower the THinfinite. The switching can further be a function of an energy distribution, where the switching switches to the infinite ERL mode based on a non-localized energy distribution over an echo path delay for a predetermined period of time.
摘要:
There is provided a speech encoding system that receives a speech signal. The speech encoding system comprises a frame processor for processing a frame of the speech signal, where the frame processor includes a pitch gain generator that derives unquantized pitch gains, and a first vector guantizer that receives the unquantized pitch gains and generates quantized pitch gains. The speech encoding system also comprises a subframe processor that begins subframe processing after the pitch gain generator has derived the unquantized pitch gains and the first vector quantizer has generated the quantized pitch gains.