摘要:
An exemplary multi-channel speech processor comprises a controller capable of interfacing with a plurality of channels, and at least one signal processing unit (SPU) coupled to the controller, where the multi-channel speech processor has a maximum execution time for processing all frames, one channel at a time, by processing a single frame from each of the plurality of channels. The signal processing unit encodes each of the single frames from each of the plurality of channels, one channel at a time, to generate encoded frames until the maximum execution time elapses or is about to elapse. The controller also transmits a pre-determined frame for each of the plurality of channels not processed during the encoding step, due to the maximum execution time elapsing or being about to elapse, such that the predetermined frame causes a decoder which receives the predetermined frame to generate a frame erase frame.
摘要:
In a coding procedure, coding parameters are selected for coding the speech signal to achieve enhanced perceptual quality of reproduced speech. At least one coding parameter value or preferential coding parameter value is selected to make a spectral response of the speech signal more uniform to compensate for spectral variations that might otherwise be imparted into the speech signal by a communications network associated with the signal processing system.
摘要:
A signal processing system is well suited for conditioning a speech signal prior to coding the speech signal to achieve enhanced perceptual quality of reproduced speech. The signal processing system may be incorporated into mobile or portable wireless communications devices, wireless infrastructure equipment, or both. The signal processing system includes a filtering arrangement for filtering an input speech signal to make a spectral response of the speech signal more uniform to compensate for spectral variations that might otherwise be imparted into the speech signal by a communications network associated with the signal processing system.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The speech coder distinguishes various voice signals as a function of their voice content. For example, a Voice Activity Detection (VAD) algorithm selects an appropriate coding scheme depending on whether the speech signal comprises active or inactive speech. The encoder may consider varying characteristics of the speech signal including sharpness, a delay correlation, a zero-crossing rate, and a residual energy. In another embodiment of the present invention, code excited linear prediction is used for voice active signals whereas random excitation is used for voice inactive signals; the energy level and spectral content of the voice inactive signal may also be used for noise coding.
摘要:
A speech communication system is provided that uses pitch information, pitch lags, pitch gains, energy and/or other speech characteristics about the outgoing speech and the unknown signal on a frame basis to determine if the unknown signal is an echo signal of the outgoing speech or if the unknown signal also contains speech from a second talker (double talk). Additionally, a plurality of frames of these characteristics of the outgoing speech signal and the unknown incoming signal may be buffered so that the analysis and comparison can be made more efficiently and quickly in the frame domain as opposed to a time domain. Multiple characteristics may be optionally weighted and then analyzed. The system and method may further determine a level of confidence, based on any criterion, in the determination that may then be used to adjust the gain of a filter.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
A multi-rate speech coded supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The fixed codebook contains pulse subcodebooks and noise-like subcodebooks. To assist in selection of one of the subcodebooks, an adaptive weighting approach is applied in a searching procedure wherein residual classification and various parameters are used to generate a weighting function that is used to favor one subcodebook over another. The pulse subcodebooks are favored to code pulse-like residuals, while the noise-like subcodebooks are favored to code noise-like residuals. The classification may involve identification of noise-like residuals, while the various parameters may comprise pitch correlation, signal to noise ratio, and average to peak ratio. Favoring involves an adjustment to a weighting factor applied to the subcodebooks.
摘要:
Silence description coding for multi-rate speech coding systems that employ discontinued transmission. Speech coding systems include multi-rate speech codecs having an encoder and a decoder. The silence description coding is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes. The silence description coding is also accompanied with signaling coding and channel coding of the speech signal. Error checking is performed using an unused portion of a bandwidth of the multi-rate speech codec's bit rate. This error checking involves majority voting in certain embodiments of the invention.
摘要:
A system and method to improve the quality of coded speech coexisting with background noise. For instance, the present invention receives a coded speech signal via a communication network and then decodes and synthesizes the different parameters contained within it to produce a synthesized speech signal. The present invention determines the non-speech periods that are represented within the synthesized speech signal. The determined non-speech periods are then utilized to determine and code LPC parameters needed for background noise synthesis. Because medium or low bit rate LPC-coded speech during voice activity periods has the coexisting background noise attenuated, the decoded signal has audible abrupt changes in the level of the background noise. To improve decoded speech quality, the present invention adds simulated background noise to decoded noisy speech when synthesizing the noisy speech signal during voice activity periods. The resulting output signal sounds more natural and realistic to the human ear because of the continuous presence of background noise during speech and non-speech periods.
摘要:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. The encoder applies adaptive gain reduction to optimize selection of appropriate gain contributions from the adaptive and fixed codebooks. Specifically, the encoder uses a first target signal to identify a contribution (a best code vector and a gain) from the adaptive codebook. Thereafter, a contribution from the fixed codebook is selected. The gain associated with the adaptive codebook contribution is then reduced by a factor, and the gain contribution from the fixed codebook is searched a second time, permitting fine tuning of the overall contribution. The gain reduction factor applied is adapted by considering both the encoding bit rate and a normalized correlation between the original target signal and the filtered signal from the adaptive codebook.