摘要:
A speech signal is input to an excitation signal generating section, a prediction filter and a prediction parameter calculator. The prediction parameter calculator calculates a predetermined number of prediction parameters (LPC parameter or reflection coefficient) by an autocorrelation method or covariance method, and supplies the acquired prediction parameters to a prediction parameter coder. The codes of the prediction parameters are sent to a decoder and a multiplexer. The decoder sends decoded values of the codes of the prediction parameters to the prediction filter and the excitation signal generating section. The prediction filter calculates a prediction residual signal, which is the difference between the input speech signal and the decoded prediction parameter, and sends it to the excitation signal generating section. The excitation signal generating section calculates the pulse interval and amplitude for each of a predetermined number of subframes based on the input speech signal, the prediction residual signal and the quantized value of the prediction parameter, and sends them to the multiplexer. The multiplexer combines these codes and the codes of the prediction parameters, and send the results as an output signal of a coding apparatus to a transmission path or the like.
摘要:
Adjusting the shape of a spectrum of a speech signal includes the steps of using a first filter with pole-zero transfer function A(z)/B(z) for subjecting a speech signal to a spectrum envelop emphasis and a second filter cascade-connected with the first filter, for compensating for a spectral tilt due to the first filter, independently deriving two filter coefficients used in the second filter for compensating for the spectral tilt from the pole-zero transfer function, and compensating for the spectral tilt corresponding to the pole-zero transfer function according to the derived filter coefficients.
摘要:
A vector quantizing apparatus includes a first search section for obtaining an approximate vector X1 which is approximated to a desired vector R, a residual vector calculator for calculating a residual vector Rv from the desired vector R and the approximate vector X1, a weighting section for obtaining weighted vectors X2 to XN of code vectors x2 to xN, and a second search section for calculating an estimation value which is the magnitude of a projection vector of the residual vector Rv with respect to the vector space formed by the approximate vector X1 and the weighted vectors X2 to XN, and searching a code vector which maximizes this estimation value.
摘要:
A speech signal is input to an excitation signal generating section, a prediction filter and a prediction parameter calculator. The prediction parameter calculator calculates a predetermined number of prediction parameters (LPC parameter or reflection coefficient) by an autocorrelation method or covariance method, and supplies the acquired prediction parameters to a prediction parameter coder. The codes of the prediction parameters are sent to a decoder and a multiplexer. The decoder sends decoded values of the codes of the prediction parameters to the prediction filter and the excitation signal generating section. The prediction filter calculates a prediction residual signal, which is the difference between the input speech signal and the decoded prediction parameter, and sends it to the excitation signal generating section. The excitation signal generating section calculates the pulse interval and amplitude for each of a predetermined number of subframes based on the input speech signal, the prediction residual signal and the quantized value of the prediction parameter, and sends them to the multiplexer. The multiplexer combines these codes and the codes of the prediction parameters, and send the results as an output signal of a coding apparatus to a transmission path or the like.
摘要:
This invention provides a novel speech coding system which recursively executes a filter-applied "Toeplitz characteristic" by causing a drive signal (i.e., an excitation signal) to be converted into a "Toeplitz matrix" when detecting a pitch period in which distortion of the input vector and the vector subsequent to the application of filter-applied computation to the drive signal vector in the pitch forecast called either "closed loop" or "compatible code book" is minimized. The vector quantization method substantially making up the speech coding system of the invention is characteristically used by the system.
摘要:
According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
摘要:
Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.
摘要:
A feature extracting unit extracts a feature vector of an input speech. A similarity calculating unit calculates degrees of similarity for each of a plurality of noise environments, based on the feature vector. A compensation-vector calculating unit acquires a first compensation vector from a storing unit, calculates a second compensation vector based on the first compensation vector, and calculates a third compensation vector by weighting and summing the second compensation vector with the degree of similarity as weights. A compensating unit compensates the feature vector based on the third compensation vector.
摘要:
A noise-environment storing unit stores therein a compensation vector for compensating a feature vector of a speech. A feature-vector extracting unit extracts the feature vector of the speech in each of a plurality of frames. A noise-environment-series estimating unit estimates a noise-environment series based on a feature-vector series and a degree of similarity. A calculating unit obtains a compensation vector corresponding to each noise environment in estimated noise-environment series based on the compensation vector present in the noise-environment storing unit. A compensating unit compensates the extracted feature vector of the speech based on obtained compensation vector.
摘要:
A speech encoding method, apparatus and program wherein an input speech signal is divided into a plurality of frames each having a predetermined length, each of the frames is subdivided into a plurality of subframes, a predictive pitch period of a subframe in a to-be-encoded current frame is obtained by using pitch periods of at least two frames of the current frame and past and future frames with respect to the current frame; a pitch period of a subframe in the current frame is obtained by using the predictive pitch period, a relative pitch pattern codebook storing a plurality of relative pitch patterns representing fluctuations in pitch periods of a plurality of subframes is prepared, and a change in pitch period of plural subframes is expressed with one relative pitch pattern selected from the relative pitch pattern codebook.