摘要:
An acoustic signal processing circuit extracts input speech pattern data and subsidiary feature data from an input speech signal. The input speech pattern data comprise frequency spectra, whereas the subsidiary feature data comprise phoneme and acoustic features. These data are then stored in a data buffer memory. The similarity measures between the input speech pattern data stored in the data buffer memory and reference speech pattern data stored in a dictionary memory are computed by a similarity computation circuit. When the largest similarity measure exceeds a first threshold value and when the difference between the largest similarity measure and the second largest measure exceeds a second threshold value, category data of the reference pattern which gives the largest similarity measure is produced by a control circuit to correspond to an input speech. When recognition cannot be performed, the categories of the reference speech patterns which respectively give the largest to mth similarity measures are respectively compared with the subsidiary feature data. In this manner, subsidiary feature recognition of the input voice is performed by a subsidiary feature recognition section.
摘要:
Phoneme feature parameters are extracted from input digital speech signals by means of LPC analysis. Phonetic segments having phonetical meanings are obtained together with similarities to prescribed basic phonetic segments from the feature parameters to be passed through nodes of transition networks provided for each word. In passing the nodes, scores for similarity Sj of predetermined segments of the corresponding phonetic segments are made in selective scoring and the accumulation of the scores is used for recognition of continuous word speech.
摘要:
In a speech recognition system for recognizing speeches uttered by non-specific speakers, start and end points of a word or speech interval are determined by a novel preprocessor for searching a sound power level to obtain speech boundary candidates and for determining likelihoods of speech or word intervals on the basis of the boundary candidates. Since likelihoods (probabilities) are determined for speech interval candidates, the similarity rate between feature parameters and reference pattern set of a speech signal are calculated for only the higher likelihood candidates, thus improving the accuracy and the speed of speech recognition. A percentage of erroneous boundary decision is about 0.5% when two speech interval candidates of the first and second likelihoods are adopted.
摘要:
Continuous speech signal is recognized using "rough" and "detail" parameters derived from prestored reference speech and current unknown speech. The detail parameters are 16 spectral coefficients, the rough parameters 2 or 4 spectral coefficients representing the signal. A word interval detector decides segmentation based on rough parameter similarity.
摘要:
A phoneme information extracting apparatus includes correlation data generators for successively generating correlation data representing the correlation between the acoustic power spectrum data corresponding to input voice and power spectrum data of various reference phonemes, selection circuits for successively transferring these correlation data when they detect that three or more successive correlation data have values greater than a predetermined value, maximum data hold circuits for holding the maximum correlation data among the correlation data transferred from the respective selection circuits, and a phoneme determination circuit for determining the optimum phoneme by detecting one of the data hold circuits that is holding the maximum correlation data among the correlation data held in the data hold circuits.
摘要:
Synthesized speech is generated by a software-implemented system with a programmed central processing unit. Phonetic parameters are generated from a series of phonetic symbols of an input text to be converted into synthesized speech, and prosodic parameters are also generated from prosodic information of the input text. The activity ratio of the central processing unit is determined, and the order of phonetic parameters or the arrangement of a synthesis unit or filter for speech synthesis is determined depending on the determined activity ratio of the central processing unit. Synthesized speech sounds are generated and filtered based on the phonetic and prosodic parameters according to the determined order of phonetic parameters or the determined arrangement of the filter.
摘要:
A plurality of candidate phonetic segments extracted from the input speech signal are passed through transition networks prepared for the respective words so as to obtain a score by weighting/averaging the long-term strategic scores by taking consideration of statistic distribution of the similarities or distances of phonetic segments and the short-term strategic scores by taking consideration of the environment of the phonetic segments.
摘要:
Speech pattern data representing speech of a plurality of speakers are stored in a pattern storage section in advance. Averaged pattern data obtained by averaging a plurality of speech pattern data of the first of the plurality of speakers are obtained. Data obtained by blurring and differentiating the averaged pattern data are stored in an orthogonalized dictionary as basic orthogonalized dictionary data of first and second axes, respectively. Blurred data and differentiated data obtained with respect to the second and subsequent of the plurality of speakers are selectively stored in the orthogonalized dictionary as additional dictionary data having new axes. Speech of the plurality of speakers is recognized by computing a similarity between the orthogonalized dictionary formed in this manner and input speech.
摘要:
Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.
摘要:
A recognition system comprises a feature extractor for extracting a feature vector x from an input speech signal, and a recognizing section for defining continuous density Hidden Markov Models of predetermined categories k as transition network models each having parameters of transition probabilities p(k,i,j) that a state Si transits to a next state Sj and output probabilities g(k,s) that a feature vector x is output in transition from the state Si to one of the states Si and Sj, and recognizing the input signal on the basis of similarity between a sequence X of feature vectors extracted by the feature extractor and the continuous density HMMs. Particularly, the recognizing section includes a memory section for storing a set of orthogonal vectors .phi..sub.m (k,s) provided for the continuous density HMMs, and a modified CDHMM processor for obtaining each of the output probabilities g(k,s) for the continuous density HMMs in accordance with corresponding orthogonal vectors .phi..sub.m (k,s).