摘要:
A speech codec operating at low data rates uses an iterative method to jointly optimize pitch and gain parameter sets. A 26-bit spectrum filter coding scheme may be used, involving successive subtractions and quantizations. The codec may preferably use a decomposed multipulse excitation model, wherein the multipulse vectors used as the excitation signal are decomposed into position and amplitude codewords. Multipulse vectors are coded by comparing each vector to a reference multipulse vector and quantizing the resulting difference vector. An expanded multipulse excitation codebook and associated fast search method, optionally with a dynamically-weighted distortion measure, allow selection of the best excitation vector without memory or computational overload. In a dynamic bit allocation technique, the number of bits allocated to the pitch and excitation signals depend on whether the signals are "significant" or "insignificant". Silence/speech detection is based on an average signal energy over an interval and a minimum average energy over a predetermined number of intervals. Adaptive post-filter and the automatic gain control schemes are also provided. Interpolation is used for spectrum filter smoothing, and an algorithm is provided for ensuring stability of the spectrum filter. Specially designed scalar quantizers are provided for the pitch gain and excitation gain.
摘要:
A linear predictive speech codec arrangement including: a spectrum synthesizer for providing reconstructed speech generation in response to excitation signals; a distortion analyzer for comparing the reconstructed speech with an original speech, and providing a distortion analysis signal in response to such comparison; and an excitation model circuit for providing excitation signals to the spectrum synthesizer, with the excitation model circuit receiving and utilizing the distortion analysis signal in an analysis-by-synthesis operation, for determining ones of excitation signals which provide an optimal reconstructed speech. The excitation model circuit can include: a voiced excitation generator and a Gaussian noise generator, both of which should optimally provide a plurality of available excitation signal models. The voiced excitation generator and Gaussian noise generator can be in the form of a codebook of a plurality of possible pulse trains and Gaussian sequences, respectively, or alternatively, the voiced excitation generator can be in the form of a first order pitch synthesizer. The optimal excitation signal and/or the pitch value and the pitch filter coefficient are determined using an analysis-by-synthesis technique.
摘要:
Protection of a digital multi-pulse speech coder from fading pattern bit errors common in a digital mobile radio channel is accomplished with error detection techniques which are simple to implement and require no error correcting codes. A synthetic regeneration algorithm is employed which uses only the perceptually significant bits in the transmitted frame. Separate parity checksums for line spectrum pair frequency data, pitch lag data and pulse amplitude data are added to each frame of speech coder bits in the transmitter. The bits are then transmitted through a mobile environment susceptible to fading that induces bursty error patterns in the stream. At the receiving station, the parity checksum bits and speech coder bits are used to determine if an error has occurred in a particular section of the bit stream. Detected errors are flagged and supplied to the speech decoder. The speech decoder uses the error flags to modify its output signal so as to minimize perceptual artifacts in the output speech. Separate checksums are developed for subsets of line spectrum pair (LSP) coefficients and related speech data, whereby a single subset may be error-detected and replaced, rather than an entire frame.
摘要:
A multi-pulse speech coding method and apparatus capable of encoding speech at a bit rate of 16 kbps or less. The method determines the location and amplitude of a pulse by searching through all of the samples of a criterion function, modifying all of the samples of the criterion function, and them repeating the pulse search. After the predetermined number of pulses have been determined, the method modifies the amplitude of the determined pulse, modifies the criterion function at the location where the pulses are set, and repeats such pulse amplitude modification. The method is, therefore, capable of modifying a pulse amplitude by using only a minimum amount of computation. As compared to the amount of computerization required by a method of the kind which modifies pulse amplitude in a pulse search loop.
摘要:
A low bit rate speech coding method and implementing apparatus in which a linear predictive coding (LPC) speech synthesizer receives an excitation sequence comprised of pulses having selected amplitudes at predetermined positions within a frame to minimize the weighted mean square error between the synthetic speech produced by the LPC synthesizer and the input speech. Pulse locations and the pulse amplitudes at the respective locations are determined by a sequential processing technique in which the amplitude and location of each pulse are determined in accordance with the previously determined amplitudes and locations of the pulses preceeding the present pulse in the same frame; and specifically by determining the amplitude g.sub.k and location l.sub.k of a new pulse in a frame from selected pulses S at locations k-1 through k-S close to location l.sub.k. The number S of preceeding pulses used to determine the pulse at location l.sub.k is selected such that the distance between the Sth pulse preceeding the pulse at location l.sub.k affects the determination of the pulse at l.sub.k while pulses prior to the Sth pulse have no appreciable effect on the determination of the pulse at l.sub.k. That is, each of the S pulses within a threshold distance T.sub.th is judged to effect the detection of the pulse at l.sub.k while pulses preceeding the pulse at l.sub.k and outside of the range T.sub.th are judged to not effect the determination of the pulse at l.sub.k.
摘要:
A method and implementing apparatus for low-bit rate speech band signal coding. An input signal in the speech band is represented by a pulse excitation sequence and a spectral parameter sequence over a frame of predetermined frame length using a selected one of a plurality of pulse determining processing modes. The selected pulse determining processing mode sequentially determines the amplitudes g.sub.i and locations m.sub.i of the pulses of the pulse excitation sequence on the basis of the amplitudes and locations of pulses in a previous frame. The selection process of determining which of the pulse determined processing modes to be used involves analyzing the input signal to produce a judgment signal d signifying the input signal as a voiced or an unvoiced signal, and selecting the pulse determining processing mode in response to the judgment signal d. The pulse excitation sequence and spectral parameter sequence are coded for transmission to a suitable receiver. The judgment signal d may also be coded and transmitted to the receiver. The receiver reproduces the input signal from the received coded signal.
摘要:
In an encoder operable in response to a discrete pattern signal divisible into a succession of segments to produce an output code sequence, a pitch parameter and a spectral parameter are extracted in a parameter calculator from each segment and from a spectral interval. In an excitation pulse producing circuit, each spectral interval is divided into a plurality of subframes, namely, pitch periods with reference to the pitch parameter to divide each segment. A minor group of excitation pulses is calculated from the segment at every subframe to form a major group of the excitation pulses in the spectral interval. The excitation pulses of the major group are reduced in number with reference to adjacent ones of the minor groups in each spectral interval and are modified into a succession of modified excitation pulses. The modified excitation pulses are combined with the spectral parameter into the output code sequence. In a decoder, the modified excitation pulses and the spectral parameter are extracted from the output code sequence. The pitch parameter is recovered by the use of the extracted and mofified excitation pulses and is used to produce a reproduction of the discrete pattern signal. Alternatively, the pitch parameter may be sent from the encoder together with the spectral parameter and the modified excitation pulses as the output code sequence and extracted from the output code sequence in the decoder.
摘要:
A multi-pulse excitation linear-predictive speech coder operates in accordance with an analysis-by-synthesis method for determining the excitation. The coder (10) comprises an LPC-analyzer (11), a multi-phase excitation generator (13), means (12, 14) for forming an error signal representative of the difference between an original speech signal (s(n)) and a synthetic speech signal (s(n)), a filter (15) for perceptually weighting the error signal and means (16) responsive to the weighted error signal (e(n)) for generating pulse parameters controlling the excitation generator (13) so as to minimize a predetermined measure of the weighted error signal. The LPC-parameters and the pulse parameters of the excitation signal (x(n)) are encoded for efficient storage or transmission. The bit capacity required for pulse position encoding of the excitation signal (x(n)) is considerably reduced by arranging the excitation generator (16) for an excitation signal (x(n)) which in each excitation interval (L) consists of a pulse pattern having a grid of a predetermined number (q) of equidstant pulses and by arranging the control means (16) for generating pulse parameters characterizing the grid position (k) relative to the beginning of the excitation interval (L) and the variable amplitudes (b.sub.k (j), 1.ltoreq.j.ltoreq.q) of the pulses of the grid.
摘要:
In an encoder for encoding a speech signal having a spectrum envelope into a plurality of excitation pulses, a spectrum emphasis unit emphasizes peak components of the spectrum envelope to produce an emphasized speech signal. As a result of a spectrum emphasis operation, the emphasized speech signal has an emphasized spectrum envelope which substantially comprises a plurality of line spectra. Responsive to the emphasized speech signal, a pulse producing unit produces a plurality of excitation pulses by the use of a pulse search method.
摘要:
A vector excitation coder compresses vectors by using an optimum codebook designed off line, using an initial arbitrary codebook and a set of speech training vectors exploiting codevector sparsity (i.e., by making zero all but a selected number of samples of lowest amplitude in each of N codebook vectors). A fast-search method selects a number N.sub.c of good excitation vectors from the codebook, where N.sub.c is much smaller thaORIGIN OF INVENTIONThe invention described herein was made in the performance of work under a NASA contract, and is subject to the provisions of Public Law 96-517 (35 USC 202) under which the inventors were granted a request to retain title.