摘要:
A speech encoding method and apparatus in which an input speech signal is divided in terms of blocks or frames as encoding units and encoded in terms of the encoding units, whereby explosive and fricative consonants can be impeccably reproduced, while there is an attenuation of the occurrence of foreign sounds being generated at a transient portion between voiced (V) and unvoiced (UV) portions, so that the speech with high clarity devoid of “stuffed” feeling may be produced. The encoding apparatus includes a first encoding unit for finding residuals of linear predictive coding (LPC) of an input speech signal for performing harmonic coding and a second encoding unit for encoding the input speech signal by waveform coding. The first encoding unit and the second encoding unit are used for encoding a voiced (V) portion and an unvoiced (UV) portion of the input signal, respectively. Code excited linear prediction (CELP) encoding employing vector quantization by a closed loop search of an optimum vector using an analysis-by-synthesis method is used for the second encoding unit. A corresponding decoding method and apparatus is also provided.
摘要:
A method and apparatus for voiced/unvoiced decision for judging whether an input speech signal is voiced or unvoiced. The input parameters for performing the voiced/unvoiced (V/UV) decision are comprehensively judged in order to enable high-precision V/UV decision by a simplified algorithm. Parameters for the voiced/unvoiced (V/UV) decision include the frame-averaged energy of the input speech signal lev, the normalized autocorrelation peak value r0r, the spectral similarity degree pos, the number of zero crossings nZero, and the pitch lag pch. If these parameters are denoted by x, these parameters are converted by function calculation circuits using a sigmoid function g(x) represented byg(x)=A/(1+exp (-(x-b)/a))where A, a, and b are constants differing with each input parameter. Using the parameters converted by this sigmoid function g(x), the voiced/unvoiced decision is made a V/UV decision circuit.
摘要:
A method and apparatus for reproducing speech signals at a controlled speed and for synthesizing speech includes a dividing unit that divides the input speech into time segments and an encoding unit that discriminates whether each of the speech segments is voiced or unvoiced. Based on the results of the discrimination, the encoding unit performs sinusoidal synthesis and encoding for voiced segments and vector quantization by closed-loop search for an optimum vector using an analysis-by-synthesis method for unvoiced segments in order to find encoded parameters. A period modification unit modifies the length of time associated with each signal segment and calculates a set of modified encoded parameters. In the speech synthesizing unit, encoded speech signal data is output from the encoding unit and pitch data and amplitude data specifying the spectral envelope are sent via a data conversion unit to a waveform synthesis unit, where the number of amplitude data points of the spectral envelope is changed without changing the shape of the spectral envelope, so that the pitch of the signal may be varied without changing its phoneme. A waveform synthesis unit synthesizes the speech waveform based on the converted spectral envelope data and pitch data.
摘要:
An encoding apparatus in which an input speech signal is divided into blocks and encoded in units of blocks. The encoding apparatus includes an encoding unit for performing CELP encoding having a noise codebook memory containing having codebook vectors generated by clipping Gaussian noise and codebook vectors obtained by learning using the code vectors generated by clipping the Gaussian noise as initial values. The encoding apparatus enables optimum encoding for a variety of speech configurations.
摘要:
A speech encoding method and apparatus for encoding an input speech signal on a block-by-block or frame-by-frame basis wherein short-term prediction residuals are found and then sinusoidal analytic encoding parameters are produced based on those short-term prediction residuals. Perceptually weighted vector quantization is performed for voiced blocks or frames by encoding their sinusoidal frequency or analytic harmonic magnitudes and, in the case of unvoiced blocks or frames, the time waveforms of the unvoiced blocks are encoded.
摘要:
A speech decoding method and apparatus for decoding encoded speech signals and subsequently post-filtering the decoded signals, wherein the filter coefficient of a spectral shaping filter in a post-filter fed with an encoded and subsequently decoded speech signal is updated with a sub-frame period, while the gain of a gain adjustment circuit for correcting gain changes caused by the spectral shaping is updated with a frame period that is eight times as long as the sub-frame period. This achieves switching of the filter coefficient so as to be changed smoothly with a higher follow-up speed, while suppressing level changes otherwise caused by frequent gain switching. The result is improved characteristics of a post-filter used for spectral shaping of a decoded signal supplied from the signal decoder and more effective post-filter processing.
摘要:
A pitch extraction method and apparatus whereby the pitch of a speech signal having various characteristics can be extracted accurately. The frame-based input speech signal, band-limited by an HPF 12 and an LPF 16, is sent to autocorrelation computing units 13, 17 where autocorrelation data is found. The pitch lag is computed and normalized in the pitch intensity/pitch lag computing units 14, 18. The pitch reliability of the input speech signals, limited by the HPF 12 and the LPF 16, is computed in elevation parameter calculation units. A selection unit 20 selects one of the parameters obtained from the input speech signal, limited by the HPF 12 and the LPF 16, using the pitch lag and the evaluation parameter.
摘要:
A method and apparatus for encoding an input signal, such as a broad-range speech signal, in which a number of decoding operations with different bit rates are enabled for assuring a high encoding bit rate and for minimizing deterioration of the reproduced sound even with a low bit rate. The signal encoding method includes a band-splitting step for splitting an input signal into a number of bands and a step of encoding signals of the bands in a different manner depending on signal characteristics of the bands. Specifically, a low-range side signal is taken out by a low-pass filter from an input signal entering a terminal, and analyzed for Linear Predictive coding by an Linear Predictive coding analysis quantization unit. After finding the Linear Predictive coding residuals, as short-term prediction residuals by an Linear Predictive coding inverted filter, the pitch is found by a pitch analysis circuit. Then, pitch residuals are found by long-term prediction by a pitch inverted filter. The pitch residuals are processed with modified discrete cosine transform by a modified discrete cosine transform (MDCT) circuit and vector-quantized by a vector-quantization circuit. The resulting quantization indices are transmitted along with the pitch lag and the pitch gain. The linear spectral pairs linear spectral pairs are also sent as parameter representing LPC coefficients.
摘要:
A signal decoding method and apparatus in which the speech signal reproducing speed is controlled without changing the phoneme or the pitch, in which the apparatus has a data number convertor for converting the number of orthogonal transform coefficients entering a transmission signal input terminal from N to M, an inverse orthogonal transform unit for inverse orthogonal-transforming the M number of the orthogonal transform coefficients obtained by the data number convertor, and a linear predictive coding synthesis filter for performing predictive synthesis based on the short-term prediction residuals obtained by the inverse orthogonal transform unit. For an input signal, short-term prediction residuals are found and are orthogonally transformed to form the orthogonal transform coefficients at a rate of N coefficients per transform unit. The frequency positions of the N transform coefficients may be rearranged to M values by M/N or by oversampling to change N to M. A portable radio terminal embodying the invention is described.
摘要:
A bandwidth expanding method and apparatus in which frequency characteristics of high-frequency components of broad band signals can be adjusted to the liking of the user, overflow due to addition is prevented from occurring without power variations being perceived by a user, the number of broad band formants is reduced, and emphasis is attached to the rough structure of the spectrum, so that the produced broad band speech signals can be improved in quality. To this end, in a speech bandwidth expansion device, frequency characteristics of the frequency components not less than 3400 Hz are adjusted by preset alterable parameter values and summed to the original narrow band speech components. If overflow has occurred in a sample, the high-range gain of the sample is lowered to a level below the overflow level before proceeding to addition. Also, broad band autocorrelation &ggr;w is generated and inverse-transformed in an inverse parameter conversion unit to produce broad band linear prediction coefficient &agr;W to synthesize the broad-band speech in a linear predictive coding synthesis unit.