Abstract:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The speech encoder also utilizes an adaptive weighting factor in the selection of a current pitch lag value from a plurality of pitch lag candidates. For example, if the speech encoder identifies an integer multiple timing relationship between any two pitch lag candidates, the pitch lag candidate with the smallest timing value is favored through adjustment of the weighting factor. Similarly, if a pitch lag candidate exhibits timing that corresponds to that of previous pitch lag values, the weighting factor is adjusted to favor that candidate.
Abstract:
In a coding procedure, a spectral content of a speech signal is estimated. A preferential coding algorithm or preferential value of at least one coding parameter is selected based on the estimated spectral content of the speech signal. The speech signal is coded in accordance with the selected coding algorithm or the selected coding parameter to control the operation of one or more of the following: a pre-processing filter, a post-processing filter, a coding control coefficient, a weighting filter, a synthesis filter, and a quantization table.
Abstract:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The fixed codebook contains pulse subcodebooks and noise-like subcodebooks. To assist in selection of one of the subcodebooks, an adaptive weighting approach is applied in a searching procedure wherein residual classification and various parameters are used to generate a weighting function that is used to favor one subcodebook over another. The pulse subcodebooks are favored to code pulse-like residuals, while the noise-like subcodebooks are favored to code noise-like residuals. The classification may involve identification of noise-like residuals, while the various parameters may comprise pitch correlation, signal to noise ratio, and average to peak ratio. Favoring involves an adjustment to a weighting factor applied to the subcodebooks.
Abstract:
There is provided transcoding of speech in a packet network environment. A decoder configured to receive a first bit-stream encoded according to a first coding scheme. The decoder decodes the bit-stream according to the first coding scheme, generates a plurality of first speech samples, and extracts a plurality of first speech parameters, which may include spectral characteristics, energy, pitch and/or pitch gain. A converter then converts the plurality first speech samples and plurality of first speech parameters to a plurality of second speech samples and a plurality of second speech parameters for use according to a second coding scheme. The first and second coding schemes may be, for example, G.711, G.723.1, G.726 or G.729, and may be parametric or non-parametric. An encoder receives the plurality of second speech samples and plurality of second speech parameters and generates a second bit-stream according to the second coding scheme.
Abstract:
A method for preparing a speech signal for encoding comprises determining whether the spectral content of an input speech signal is representative of a defined spectral characteristic (e.g., a defined characteristic slope). A frequency specific filter component of a weighting filter is controlled based on the determination of the spectral content of speech signal or/and its location in the encoder. A core weighting filter component of the weighting filter may be maintained regardless of the spectral content of the speech signal.
Abstract:
A speech compression system with a fixed codebook structure and a search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A better way is used to calculate a criterion value, minimizing an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.
Abstract:
A signal processing system is well suited for conditioning a speech signal prior to coding the speech signal to achieve enhanced perceptual quality of reproduced speech. The signal processing system may be incorporated into mobile or portable wireless communications devices, wireless infrastructure equipment, or both. The signal processing system includes a filtering arrangement for filtering an input speech signal to make a spectral response of the speech signal more uniform to compensate for spectral variations that might otherwise be imparted into the speech signal by a communications network associated with the signal processing system.
Abstract:
Voiced speech preprocessing employs waveform interpolation or a harmonic model circuit to smooth a transition region and simplify speech coding. At low bit rates, the speech is coded by a system that maintains a high perceptual quality in the transition region from a voiced (quasi-periodic) portion of the speech signal to an unvoiced (non-periodic) portion of the speech signal. Similarly, the transition region from an unvoiced portion to a voiced portion is conditioned to maintain a high perceptual quality at a low bandwidth. The transition region from one type of voiced region to another type of voiced region is also smoothed. The transition region is smoothed to create a quasi-periodic speech signal.
Abstract:
A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. The speech encoder continuously warps a weighted speech signal in long term preprocessing. The continuous warping is applied to a linear pitch lag contour that enables fast searching through linear time weighting. Optimal searching is performed within a limited range that is defined at least in part on sharpness and speech classification. The speech encoder generates the linear pitch lag contour from previous and current pitch lag values. Such continuous warping may also be applied in an open loop approach to the residual signal.
Abstract:
Pulses representing the excitation signal (P1, P2, P3) are commonly represented as an impulse. High frequency noise will be added to each pulse in order to provide a better quality sound.