摘要:
An exemplary decoder comprises a receiver that receives parameters of a speech signal on a frame-by-frame basis, a control logic for decoding parameters and for resynthesizing the speech signal, the control logic including a minimum spacing indicative of a minimum difference required between LSFs of consecutive frames, a frame recovery logic that, when a lost frame detector detects a lost frame, sets the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame, and/or uses pitch lag parameters of a plurality of previously received frames to extrapolate a pitch lag parameter for the lost frame, and/or sets gain parameter of a subframe of the lost frame in a first manner if the lost gain parameter is an adaptive codebook gain parameter and in a second manner if the lost gain parameter is a fixed codebook gain parameter.
摘要:
A signal compression system includes a coder and a decoder. The coder includes an extract unit for extracting an input feature vector from an input signal, a coder memory unit for storing a predesigned vector quantization (VQ) table for the coder such that the coder memory unit uses a set of primary indices to address entries within the pre-designed VQ table, a coder mapping unit for mapping indices from a set of secondary indices to the first set of indices, and a search unit for searching for one index out of the set of secondary indices, wherein the index from the set of secondary indices corresponds to an entry in the coder memory unit, and the entry best represents the input feature vector according to some predetermined criteria. On the decoder side, the decoder includes a decoder memory unit for storing the same pre-designed VQ table and set of primary indices as the coder memory unit, a decoder mapping unit, and a retrieval unit, wherein the entry indicated by the index best represents the input feature vector.
摘要:
There are provided methods and devices for generating excitation values for a speech signal. In one aspect, an example method comprises obtaining one or more characteristics of a first speech frame of the speech signal, deriving a first seed value based on the one or more characteristics of the first speech frame, providing the first seed value to a Gaussian time series generator; and using the Gaussian time series generator to generate an excitation values for the first frame. The one or more characteristics may include a spectrum information of the first frame, an energy information of the first frame, or a gain information of the first frame.
摘要:
There is provided a conference bridge or transcoder configured to intelligently handle multiple speech channels in the contest of a packet network, wherein various speech channels may adhere to variety of speech encoding standards. For example, the conference bridge establishes framing and alignment of multiple incoming speech channels associated with multiple participants, extracts parameters from the speech samples, mixes the parameters, and re-encodes the resulting speech samples for transmission to the participants. In one aspect, a speech processing method comprises decoding a first bitstream according to a first coding scheme to generate first speech samples and a first side information; generating second speech samples and a second side information using the first speech samples and the first side information, for use according to a second coding scheme; and creating a second bitstream, encoded based on the second coding scheme, using the second speech samples and the second side information.
摘要:
In an exemplary conversion scheme, a frame of a first speech signal comprising a plurality of frames encoded at a plurality of first rates, including a first non-speech rate, is received. The rate of the received frame is determined, and if the received frame is encoded at the first non-speech rate, then the received frame is re-encoded at either a second or third non-speech rate to generate a frame of a second speech signal. Moreover, a system for converting a speech signal comprises a receiver for receiving a frame of a first speech signal and a processor capable of determining the encoding rate of the received frame and re-encoding the received frame at either a second or third non-speech rate if the received frame was originally encoded at a first non-speech rate.
摘要:
There are provided speech coding methods and systems for estimating a plurality of speech parameters of a speech signal for coding the speech signal using one of a plurality of speech coding algorithms, the plurality of speech parameters includes pitch information, the plurality of speech parameters is calculated using a plurality of thresholds. An example method includes estimating a background noise level in the speech signal to determine a signal to noise ratio (SNR) for the speech signal, adjusting one or more of the plurality of thresholds based on the SNR to generate one or more SNR adjusted thresholds, analyzing the speech signal to extract the pitch information using the one or more SNR adjusted thresholds, and repeating the estimating, the adjusting and the analyzing to code the speech signal using one the plurality of speech coding algorithms.
摘要:
A method of adjusting an echo canceller comprises obtaining a first cross-correlation between a far-end signal and an error signal, wherein the error signal is generated by subtracting an output signal of an adaptive filter from a local-end signal; determining whether the first cross-correlation is above a pre-determined threshold; relocating the adaptive filter by a few samples if the determining determines that the first cross-correlation is above a pre-determined threshold; calculating a first improvement indicator parameter, wherein the first improvement indicator parameter is calculated after the relocating the adaptive filter by the few samples; determining whether the first improvement indicator parameter indicates a performance improvement by the adaptive filter after the relocating the adaptive filter by the few samples; calculating a gain based on the local-end signal and the error signal if the determining does not determine the performance improvement; and multiplying the adaptive filter by the gain.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The speech coder distinguishes various voice signals as a function of their voice content. For example, a Voice Activity Detection (VAD) algorithm selects an appropriate coding scheme depending on whether the speech signal comprises active or inactive speech. The encoder may consider varying characteristics of the speech signal including sharpness, a delay correlation, a zero-crossing rate, and a residual energy. In another embodiment of the present invention, code excited linear prediction is used for voice active signals whereas random excitation is used for voice inactive signals; the energy level and spectral content of the voice inactive signal may also be used for noise coding.
摘要:
A speech communication system is provided that uses pitch information, pitch lags, pitch gains, energy and/or other speech characteristics about the outgoing speech and the unknown signal on a frame basis to determine if the unknown signal is an echo signal of the outgoing speech or if the unknown signal also contains speech from a second talker (double talk). Additionally, a plurality of frames of these characteristics of the outgoing speech signal and the unknown incoming signal may be buffered so that the analysis and comparison can be made more efficiently and quickly in the frame domain as opposed to a time domain. Multiple characteristics may be optionally weighted and then analyzed. The system and method may further determine a level of confidence, based on any criterion, in the determination that may then be used to adjust the gain of a filter.
摘要:
A method of coding speech under background noise conditions or during noise-like speech periods wherein during active voice speech segments an analysis-by-synthesis method is used. However, when a background noise segment or noise-like speech segment is detected, an adaptive code book (pitch prediction) contribution is used as a source of a pseudo-random sequence in order to provide a better representation of the background noise or the noise-like speech. An improved gain quantization scheme is also employed when a background noise segment is detected, wherein energy of the total excitation with quantized gains is matched to the energy of total excitation with unquantized gains.