摘要:
The invention improves the encoding and decoding of speech by focusing the encoding on the perceptually important characteristics of speech. The system analyzes selected features of an input speech signal, and first performing a common frame based speech coding of an input speech signal. The system then performs a speech coding based on either a first speech coding mode or a second speech coding mode. The selection of a mode is based on characteristics of the input speech signal. The first speech coding mode uses a first framing structure and the second speech coding mode uses a second framing structure.
摘要:
The invention improves the encoding and decoding of speech by focusing the encoding on the perceptually important characteristics of speech. The system analyzes selected features of an input speech signal, and first performing a common frame based speech coding of an input speech signal. The system then performs a speech coding based on either a first speech coding mode or a second speech coding mode. The selection of a mode is based on characteristics of the input speech signal. The first speech coding mode uses a first framing structure and the second speech coding mode uses a second framing structure.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The speech coder distinguishes various voice signals as a function of their voice content. For example, a Voice Activity Detection (VAD) algorithm selects an appropriate coding scheme depending on whether the speech signal comprises active or inactive speech. The encoder may consider varying characteristics of the speech signal including sharpness, a delay correlation, a zero-crossing rate, and a residual energy. In another embodiment of the present invention, code excited linear prediction is used for voice active signals whereas random excitation is used for voice inactive signals; the energy level and spectral content of the voice inactive signal may also be used for noise coding.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. The encoder generates pluralities of codevectors from a single, normalized codevector by shifting or other rearrangement. As a result, searching speeds are enhanced, and the physical size of a codebook built from such codevectors is greatly reduced.
摘要:
An extended signal coding system that accommodates substantially music-like signals within a signal while maintaining a high perceptual quality in a reproduced signal during discontinued transmission (DTX) operation. The extended signal coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the signal, to ensure the high perceptual quality in the reproduced signal. In certain embodiments of the invention, the signal is a speech signal, and the speech signal has a substantially music-like signal contained therein, and the extended signal coding system overrides any voice activity detection (VAD) decision that is used to determine which among a plurality of source coding modes are to be employed using a voice activity detection (VAD) correction/supervision circuitry. This is particularly relevant for discontinued transmission (DTX) operation. In certain embodiments of the invention, a signal coding circuitry maintains an improved perceptual quality in a coded signal having a substantially music-like component. This assurance of an improved perceptual quality is very desirable when there is a presence of a music-like signal in an un-coded signal.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. The encoder generates pluralities of codevectors from a single, normalized codevector by shifting or other rearrangement. As a result, searching speeds are enhanced, and the physical size of a codebook built from such codevectors is greatly reduced.
摘要:
Silence description coding for multi-rate speech coding systems that employ discontinued transmission. Speech coding systems include multi-rate speech codecs having an encoder and a decoder. The silence description coding is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes. The silence description coding is also accompanied with signaling coding and channel coding of the speech signal. Error checking is performed using an unused portion of a bandwidth of the multi-rate speech codec's bit rate. This error checking involves majority voting in certain embodiments of the invention.
摘要:
A technique for suppressing non-stationary noise, such as wind noise, in an audio signal is described. In accordance with the technique, a series of frames of the audio signal is analyzed to detect whether the audio signal comprises non-stationary noise. If it is detected that the audio signal comprises non-stationary noise, a number of steps are performed. In accordance with these steps, a determination is made as to whether a frame of the audio signal comprises non-stationary noise or speech and non-stationary noise. If it is determined that the frame comprises non-stationary noise, a first filter is applied to the frame and if it is determined that the frame comprises speech and non-stationary noise, a second filter is applied to the frame.
摘要:
Systems and methods are described for applying noise suppression to one or more audio signals to generate a noise-suppressed audio signal therefrom. In a single-channel implementation, an input signal is received that comprises a desired audio signal and an additive noise signal. Noise suppression is then applied to the input signal to generate a noise-suppressed signal in a manner that is controlled by at least a parameter that specifies a degree of balance between distortion of the desired audio signal and unnaturalness of a residual noise signal included in the noise-suppressed signal. In an alternative single-channel implementation, a plurality of sub-band signals obtained by applying a frequency conversion process to a time domain representation of an input signal is received. Noise suppression is then applied to each of the sub-band signals by passing each of the sub-band signals through a time direction filter. Multi-channel noise suppression variants are also described.