摘要:
A continuous speech recognition system determines the similarity between input patterns and reference patterns over time such that similarities between previously spoken speech patterns and reference patterns are determined while speech continues to be spoken. Degrees of dissimilarity at arbitrary reference pattern word times are determined asymptotically and are recorded. The minimum degree of dissimilarity is determined and the corresponding word is categorized. Recognition decisions are ultimately made in reverse chronological order.
摘要:
A similarity calculator for calculating a set of similarity measures S(A(u, m), B.sup.c)'s according to the technique of dynamic programming comprises an input pattern buffer for successively producing input pattern feature vectors of an input pattern A to be pattern matched with reference patterns B.sup.c, an m-th input pattern feature vector a.sub.m at a time. The similarity measure set is for a set of fragmentary patterns A(u, m)'s defined by a common end point m and start points u's predetermined relative to the end point m. Scalar products (a.sub.m .multidot.b.sub.j.sup.n) are calculated between the m-th input pattern feature vector and reference pattern feature vectors b.sub.j.sup.n of an n-th reference pattern B.sup.n and stored in a scalar product buffer. Recurrence values are calculated according to a recurrence formula for each end point m, rather than for each fragmentary pattern set, and for each reference pattern B.sup.n to provide a similarity measure subset S(A(u, m), B.sup.n)'s, with a recurrence value for each reference pattern feature vector b.sub.v calculated by the use of the scalar product (a.sub.m .multidot.b.sub.v) and recurrence values calculated for a previous end point (m-1) and for at least three consecutive reference pattern feature vectors preselected relative to that reference pattern feature vector b.sub.v. Instead of the scalar product, it is possible to use any one of other measures representative of a similarity or a dissimilarity between an input pattern feature vector and a reference pattern feature vector.
摘要:
A continuous speech recognition system utilizes a format memory (14) which specifies a sequence of word sets and a plurality of words, or reference patterns, which may be included in each word set. The input pattern sequence is divided into all possible partial patterns having start points p and end points q, and each of these partial patterns is compared with all reference patterns to derive elementary similarity measures. The elementary similarity measures for each combination of a partial pattern and a permitted word in a word set under the specified format are then examined to determine the optimum input pattern segmentation points and corresponding sequence of reference patterns which will yield a maximum similarity result. The maximum similarity is represented by ##EQU1## where S(p(x-1), p(x),n(x)) indicates the degree of similarity between an input partial pattern having a start point p(x-1) and an n point p(x) and a reference word unit n(x) within a word set f.sub.x, and K represents the number of word sets permitted according to the specified format.
摘要:
A continuous speech recognition system comprises a word number specifier for specifying, as the number of continuously spoken word or words, either a single integer or a set of different integers. The single integer may be manually or automatically adjusted. In compliance with the specified word number or numbers, the system carries out pattern matching between an input pattern representative of the spoken word or words and a predetermined number of reference patterns. The matching may be carried into effect by dynamic programming. The input pattern is recognized to be one of the reference patterns or to be a concatenation of some or all of the reference patterns, equal in number either to the single integer or to one of the different integers.
摘要:
In a neural network, input neuron units of an input layer are grouped into first through J-th input layer frames, where J represents a predetermined natural number. Intermediate neuron units of an intermediate layer are grouped into first through J-th intermediate layer frames. An output layer comprises an output neuron unit. Each intermediate neuron unit of a j-th intermediate layer frame is connected to the input neuron units of j'-th input layer frames, where j is variable between 1 and j and j' represents at least two consecutive integers, one of which is equal to j and at least one other of which is less than j. Each output neuron unit is connected to the intermediate neuron units of the intermediate layer. For recognition of an input pattern represented by a time sequence of feature vectors, each consisting of K vector components, where K represents a predetermined positive integer, each input layer frame consists of K input neuron units. Each intermediate layer frame consists of M intermediate neuron units, where M represents a positive integer which is less than K. The vector components of each feature vector are supplied to the respective input neuron units of one of the input layer frames that is preferably selected from three consecutively numbered input layer frames. The neural network is readily trained to make a predetermined one of the output neuron units produce an output signal indicative of the input pattern and can be implemented by a microprocessor.
摘要:
A connected word recognition system operable according to a DP algorithm and in compliance with a regular grammar, is put into operation in synchronism with successive specification of feature vectors of an input pattern. In an m-th period in which an m-th feature vector is specified, similarity measures are calculated (58, 59) between reference patterns representative of reference words and those fragmentary patterns of the input pattern, which start at several previous periods and end at the m-th period, for start and end states of the reference words. In the m-th period, an extremum of the similarity measures is found (66, 69, 86), together with a particular word and a particular pair of start and end states thereof, and stored (61-63). Moreover, a particular start period is selected (67, 86) and stored (64). A previous extremum found and stored (61) during the (m-1)-th period for the particular start state found in the (m-1)-th period, is used in the m-th period as a boundary condition in calculating each similarity measure. After all input pattern feature vectors are processed, a result of recognition is obtained (89) by referring to the stored extrema, particular words, particular start states, and particular start periods.
摘要:
Operation of a continuous speech recognition system operable according to the dynamic programming technique, is controlled by a state transition diagram in compliance with which word sequences to be recognized by the system with reference to a predetermined number of reference words B.sup.n 's are pronounced. The system comprises a state transition table accessed by the reference words B.sup.n 's to successively produce particular states y's in the diagram and previous states z's for each particular state y. In cooperation with a recurrence value and an optimum parameter table, a matching unit determines a recurrence value T.sub.y (m) and an optimum parameter set ZUN.sub.y (m) according to: ##EQU1## where u and m represent an end and a start point of a fragmentary pattern A(u, m) of an input pattern A representative of a word sequence and D(u, m, n), a similarity measure between the fragmentary pattern A(u, m) and a reference word B.sup.n assigned to a permutation of the previous and the particular states z and y. By referring to the optimum parameter table and, as the case may be, to the recurrence value table, a decision unit recognizes the word sequence as a concatenation of optimum ones of the reference words B.sup.n 's.
摘要:
A speech recognition system adaptable to noisy environments is disclosed. The system includes a recognition unit for recognizing input speech signals and a noise measuring unit for measuring the intensity of ambient noises. The system also includes a rejection unit responsive to a rejection standard controlled by the intensity of the measured noise for rejecting the rejection results given from the recognition unit when the rejection standard is exceeded.
摘要:
A pattern matching apparatus for comparing an input pattern of features with a reference pattern. Address information is stored in the reference pattern along with features, so that branching in a work memory which stores cumulative distances between the input and reference patterns may be effected. In this manner, memory requirements for storing reference patterns are reduced, and the number of required distance calculations also is reduced.
摘要:
There is provided a voice recognition system comprising a standard pattern memory in which a voice pattern of a predetermined word is stored as a positive reference pattern and also voice patterns of words similar to but different from the first-mentioned word are stored as negative reference patterns, a pattern comparator for calculating dissimilarities of an input voice pattern with respect to the positive reference pattern and negative reference patterns, and a discriminator for providing a coincidence confirmation output signal when the dissimilarity with respect to the positive reference pattern is less than a predetermined threshold value and less than the dissimilarities with respect to the negative reference patterns while otherwise rejecting the result of recognition.