摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
There is provided a method of decoding speech data generated from a speech signal. The method comprises receiving the speech data having at least one main pulse in a subframe of the speech data; generating a first predicted pulse, based on the at least one main pulse, on one side of the main pulse in the subframe of the speech data, wherein the first predicted pulse has a lower gain than the main pulse; generating a second predicted pulse, as a mirror image of the first predicted pulse on a reverse time scale, on the other side of the main pulse in the subframe of the speech data; reconstructing the speech signal using the at least one main pulse, the first predicted pulse and the second predicted pulse.
摘要:
There are provided methods and devices for generating excitation values for a speech signal. In one aspect, an example method comprises obtaining one or more characteristics of a first speech frame of the speech signal, deriving a first seed value based on the one or more characteristics of the first speech frame, providing the first seed value to a Gaussian time series generator; and using the Gaussian time series generator to generate an excitation values for the first frame. The one or more characteristics may include a spectrum information of the first frame, an energy information of the first frame, or a gain information of the first frame.
摘要:
Speech coding systems include multi-rate speech codecs having an encoder and a decoder. Silence description coding for multi-rate speech coding systems that employ discontinued transmission is performed in either the encoder or the decoder of the multi-rate speech codec. It may also be performed in a distributed manner wherein it is performed partially in the encoder and partially in the decoder. The silence description coding is performed on a speech signal having a substantially non-speech-like characteristic. Voice activity detection classifies the speech signal as being either substantially speech-like or substantially non-speech-like. The silence description coding is selected from a plurality of coding modes. In certain embodiments of the invention, the silence description coding is a source coding mode that operates at a bit rate that fits within a bit rate budget as determined by all of the available source coding modes within the plurality of coding modes. The silence description coding is also accompanied with signaling coding and channel coding of the speech signal. Error checking is performed using an unused portion of a bandwidth of the multi-rate speech codec's bit rate. This error checking involves majority voting in certain embodiments of the invention.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The bitstream comprises a type component and a gain component. The type component is representative of a type classification of a frame of speech signal that is transmitted. The type component comprises a first type and second type. The gain component represents an adaptive codebook gain and a fixed codebook gain component comprises a fixed codebook gain component and an adaptive codebook gain component exclusively encoded as separate components of the bitstream as a function of the bit rate when the type classification is the second type.
摘要:
There is provided a conference bridge or transcoder configured to intelligently handle multiple speech channels in the contest of a packet network, wherein various speech channels may adhere to variety of speech encoding standards. For example, the conference bridge establishes framing and alignment of multiple incoming speech channels associated with multiple participants, extracts parameters from the speech samples, mixes the parameters, and re-encodes the resulting speech samples for transmission to the participants. In one aspect, a speech processing method comprises decoding a first bitstream according to a first coding scheme to generate first speech samples and a first side information; generating second speech samples and a second side information using the first speech samples and the first side information, for use according to a second coding scheme; and creating a second bitstream, encoded based on the second coding scheme, using the second speech samples and the second side information.
摘要:
A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
摘要:
An exemplary decoder comprises a receiver that receives parameters of a speech signal on a frame-by-frame basis, a control logic for decoding parameters and for resynthesizing the speech signal, the control logic including a minimum spacing indicative of a minimum difference required between LSFs of consecutive frames, a frame recovery logic that, when a lost frame detector detects a lost frame, sets the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame, and/or uses pitch lag parameters of a plurality of previously received frames to extrapolate a pitch lag parameter for the lost frame, and/or sets gain parameter of a subframe of the lost frame in a first manner if the lost gain parameter is an adaptive codebook gain parameter and in a second manner if the lost gain parameter is a fixed codebook gain parameter.
摘要:
A signal compression system includes a coder and a decoder. The coder includes an extract unit for extracting an input feature vector from an input signal, a coder memory unit for storing a predesigned vector quantization (VQ) table for the coder such that the coder memory unit uses a set of primary indices to address entries within the pre-designed VQ table, a coder mapping unit for mapping indices from a set of secondary indices to the first set of indices, and a search unit for searching for one index out of the set of secondary indices, wherein the index from the set of secondary indices corresponds to an entry in the coder memory unit, and the entry best represents the input feature vector according to some predetermined criteria. On the decoder side, the decoder includes a decoder memory unit for storing the same pre-designed VQ table and set of primary indices as the coder memory unit, a decoder mapping unit, and a retrieval unit, wherein the entry indicated by the index best represents the input feature vector.