摘要:
In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.
摘要:
There is provided a method for use by a speech encoder to encode an input speech signal. The method comprises receiving the input speech signal; determining whether the input speech signal includes an active speech signal or an inactive speech signal; low-pass filtering the inactive speech signal to generate a narrowband inactive speech signal; high-pass filtering the inactive speech signal to generate a high-band inactive speech signal; encoding the narrowband inactive speech signal using a narrowband inactive speech encoder to generate an encoded narrowband inactive speech; generating a low-to-high auxiliary signal by the narrowband inactive speech encoder based on the narrowband inactive speech signal; encoding the high-band inactive speech signal using a wideband inactive speech encoder to generate an encoded wideband inactive speech based on the low-to-high auxiliary signal from the narrowband inactive speech encoder; and transmitting the encoded narrowband inactive speech and the encoded wideband inactive speech.
摘要:
A method of masking a residual echo signal by an echo canceller is provided. The method comprises receiving a far-end signal, adjusting filter coefficients of an adaptive filter in response to the far-end signal, generating an echo model signal based on the far-end signal using the adaptive filter, receiving a near-end signal, subtracting the echo model signal from the near-end signal to generate an output signal, defining a spectral mask based on the near-end signal, wherein the spectral mask is indicative of near-end spectral peaks and near-end spectral valleys, de-emphasizing the output signal in spectral regions of the near-end spectral peaks, and emphasizing the output signal in spectral regions of the near-end spectral valleys, wherein the de-emphasizing occurs during filter coefficients determination for the adaptive filter. A weighted filter may perform the de-emphasizing and the emphasizing operations, where the weighted filter uses medium term spectral characteristics of the near-end signal.
摘要:
There is provided a method for use by a speech encoder to encode an input speech signal. The method comprises receiving the input speech signal; determining whether the input speech signal includes an active speech signal or an inactive speech signal; low-pass filtering the inactive speech signal to generate a narrowband inactive speech signal; high-pass filtering the inactive speech signal to generate a high-band inactive speech signal; encoding the narrowband inactive speech signal using a narrowband inactive speech encoder to generate an encoded narrowband inactive speech; generating a low-to-high auxiliary signal by the narrowband inactive speech encoder based on the narrowband inactive speech signal; encoding the high-band inactive speech signal using a wideband inactive speech encoder to generate an encoded wideband inactive speech based on the low-to-high auxiliary signal from the narrowband inactive speech encoder; and transmitting the encoded narrowband inactive speech and the encoded wideband inactive speech.
摘要:
There is provided a voice activity detection method for indicating an active voice mode and an inactive voice mode. The method comprises receiving a first portion of an input signal; determining that the first portion of the input signal includes an active voice signal; indicating the active voice mode in response to the determining that the first portion of the input signal includes the active voice signal; receiving a second portion of the input signal immediately following the first portion of the input signal; determining that the second portion of the input signal includes an inactive voice signal; extending the indicating the active voice mode for a period of time after determining that the second portion of the input signal includes the inactive voice signal, wherein the period of time varies based on one or more conditions; and indicating the inactive voice mode after expiration of the period of time.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
There is provided a method for use by a speech encoder to encode an input speech signal. The method comprises receiving the input speech signal; determining whether the input speech signal includes an active speech signal or an inactive speech signal; low-pass filtering the inactive speech signal to generate a narrowband inactive speech signal; high-pass filtering the inactive speech signal to generate a high-band inactive speech signal; encoding the narrowband inactive speech signal using a narrowband inactive speech encoder to generate an encoded narrowband inactive speech; generating a low-to-high auxiliary signal by the narrowband inactive speech encoder based on the narrowband inactive speech signal; encoding the high-band inactive speech signal using a wideband inactive speech encoder to generate an encoded wideband inactive speech based on the low-to-high auxiliary signal from the narrowband inactive speech encoder; and transmitting the encoded narrowband inactive speech and the encoded wideband inactive speech.
摘要:
The invention improves the encoding and decoding of speech by focusing the encoding on the perceptually important characteristics of speech. The system analyzes selected features of an input speech signal, and first performing a common frame based speech coding of an input speech signal. The system then performs a speech coding based on either a first speech coding mode or a second speech coding mode. The selection of a mode is based on characteristics of the input speech signal. The first speech coding mode uses a first framing structure and the second speech coding mode uses a second framing structure.
摘要:
There is provided a method of reducing effect of noise producing artifacts in silence areas of a speech signal for use by a speech decoding system. The method comprises obtaining a plurality of incoming samples of a speech subframe; summing an absolute value of an energy level for each of the plurality of incoming samples to generate a total input level (gain_in); smoothing the total input level to generate a smoothed level (Level_in_sm); determining that the speech subframe is in a silence area based on the total input level, the smoothed level and a spectral tilt parameter; defining a gain using k1*(Level_in_sm/1024)+(1−k1), where K1 is a function of the spectral tilt parameter; and modifying an energy level of the speech subframe using the gain.
摘要:
There is provided a method of reducing effect of noise producing artifacts in silence areas of a speech signal for use by a speech decoding system. The method comprises obtaining a plurality of incoming samples of a speech subframe; summing an absolute value of an energy level for each of the plurality of incoming samples to generate a total input level (gain_in); smoothing the total input level to generate a smoothed level (Level_in_sm); determining that the speech subframe is in a silence area based on the total input level, the smoothed level and a spectral tilt parameter; defining a gain using k1*(Level_in_sm/1024)+(1-k1), where K1 is a function of the spectral tilt parameter; and modifying an energy level of the speech subframe using the gain.