摘要:
A degrouping method for an MPEG 1 decoder for degrouping three consecutive subband samples (X, Y and Z) compressed into one codeword .COPYRGT. by a step number (N) includes the steps determining whether the value of the step number is 3, determining whether the value of the step number is 5 if the value of the step number is not 3, determining whether the value of the step number is 9 if the value of the step number is not 5, searching corresponding values of the subband samples from a first look-up table in the sequence of Z, Y and X, if the value of the step number is 3, searching corresponding values of the subband samples from a second look-up table in the sequence of Z, Y and X, if the value of the step number is 5, and searching corresponding values of the subband samples from a third look-up table in the sequence of Z, Y and X, if the value of the step number is 9, wherein the first, second and third look-up tables have the respective values of the subband samples corresponding to the codeword value. Since the MPEG 1 degrouping method obtains subband samples using look-up tables without using a divider, the number of required cycles is considerably reduced.
摘要:
A vocoder for generating speech from a plurality of stored speech parameters which computes the excitation signals in the speech production model. The present invention generates a periodic excitation signal with flat frequency response and linear group delay. The present invention uses properties of the phase delay sequence being generated to calculate each of the parameters of the excitation signal in an efficient and optimized manner. Generation of the excitation signal requires computation of the expression: ##EQU1## The above expression uses the equation: ##EQU2## This equation defines the phase relationship between the signals using a linear group delay where .phi.'.sub.I (x)* is the absolute phase offset from the first phase harmonic, I is an index for the harmonic, x is time, P is the pitch period, and k" is a constant. The present invention performs the following iterations to compute the above sequence: 1) .phi.'.sub.I (x)*=.phi.'.sub.I- (x)*+A.sub.I-1 (x) 2) A.sub.I (x)=A.sub.I-1 (x)-B where A.sub.1 values are the relative phase differences between consecutive harmonics; the .phi.'.sub.I (x)* values are the absolute phase offsets from the first phase harmonic; B is a constant of 2 k"/P.sup.2, x is the time, and I is the iteration number. After the phase offset values have been computed, cosines of the plurality of phase offset values are computed and summed to produce the excitation signal. The excitation signal is then used in a speech production model to generate speech.
摘要:
Recognition of speech with successive expansion of a reference vocabulary, can be used for automatic telephone dialing by voice input. Neural and conventional recognition methods are performed in parallel so that during training and configuration of the neural network, a conventional recognizer operating according to the dynamic programming principle has available newly added word patterns as references for immediate use in recognition. Upon completion of the training and configuration, the neural network takes over the recognition of the now expanded vocabulary.
摘要:
A method for recognizing speech elements (e.g., phones) in utterances includes the following steps. Based on acoustic frequency, at least two different acoustic representatives are isolated for each of the utterances. From each acoustic representative, tentative decision information on the speech element in the corresponding utterance is derived. A final decision on the speech element in the utterance is then generated, based on the tentative decision information derived from more than one of the acoustic representatives.
摘要:
An electronic interpreting machine comprising vocal input device for vocal input of a language, dictionary device incorporating an input language dictionary and an output language dictionary, language setting device for setting the input language as the first language and the output language as the second language, voice recognition device for recognizing and storing the first language, translating device for translating the first language which has been recognized into the selected second language, voice information generating device for generating voice information representing the translated second language, and voice output device for giving output of the voice information, wherein the maximum amount of language information which can be vocally input is set beforehand, while information volume computing device that computes the ratio of the amount of information which has been input to the vocal input device to the maximum amount of information which can be input in real time, and computed information notifying device for notifying the result of computation are provided.
摘要:
A pitch estimating method includes the steps of (1) determining a set of pitch candidates to estimate a pitch of a digitized speech signal at each of a plurality of time instants, wherein series of these time instants define segments of the digitized speech signal; (2) constructing a pitch contour using a pitch candidate selected from each of the sets of pitch candidates determined in the first step; and (3) selecting a representative pitch estimate for the digitized speech signal segment from the set of pitch candidates comprising the pitch contour.
摘要:
A sound analyzer sound analyzes an input speech signal to obtain feature vectors. A matrix quantizer performs a matrix quantization process between the feature vectors obtained by the sound analyzer and a phonetic segment dictionary prepared in phonetic segment units to obtain a phonetic segment similarity sequence. A PS-phoneme integrating section integrates the phonetic segment similarity sequence into a phonemic feature vector. A HMM recognizer checks the phonemic feature vector using a HMM prepared in certain units, to thereby perform a recognition process.
摘要:
In methods and apparatus for processing a speech signals comprising a plurality of successive signal intervals, each signal interval containing no speech sounds is classified as a noise interval, and LPC coefficients are calculated for each noise interval based on the samples of that noise interval and on the samples of a plurality of preceding signal intervals. When noise intervals encoded using LPC coefficients calculated as described above are reconstructed, the subjectively annoying "swishing" or "waterfall" effects encountered in conventional LPC speech processing systems are reduced or eliminated.
摘要:
A dynamic programming or DP matching system for speech recognition. Upon DP matching, cumulative distance is compared with a threshold value at every sampling time point of a speech pattern to thereby restrict the number of DP paths in succeeding matching processes. The number of DP paths remaining at each speech pattern sampling time point is monitored by a monitoring module for altering a threshold value so as to decrease the number of the DP paths. When DP path number becomes excessively large, the threshold is increased to thereby decrease the DP path number. Capacity of a DP data storing memory can be reduced while preventing the matching capability from being lowered.
摘要:
In a speech synthesizing apparatus, importance degree information indicative of a degree of importance with respect to each text portion of input original text data is added to this text portion. Then, the original text data with such importance degree information is input. When a rapid reading process, or a head searching process is carried out for the original text input, speech synthesis is carried out by controlling several stages which text portion should be skipped, or at which speed, the text portions should be synthesized, in response to a speed instruction and importance degree information which are being input into the speech synthesizing apparatus.