摘要:
In one aspect, the invention provides word recognition systems that operate to recognize an unrecognized or ambiguous word that occurs within a passage of words. The system can offer several words as choice words for inserting into the passage to replace the unrecognized word. The system can select the best choice word by using the choice word to extract from a reference source, sample passages of text that relate to the choice word. For example, the system can select the dictionary passage that defines the choice word. The system then compares the selected passage to the current passage, and generates a score that indicates the likelihood that the choice word would occur within that passage of text. The system can select the choice word with the best score to substitute into the passage. The passage of words being analyzed can be any word sequence including an utterance, a portion of handwritten text, a portion of typewritten text or other such sequence of words, numbers and characters. Alternative embodiments of the present invention are disclosed which function to retrieve documents from a library as a function of context.
摘要:
A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.
摘要:
A speech recognition system is disclosed which employs a network of elementary local decision modules for matching an observed time-varying speech pattern against all possible time warpings of the stored prototype patterns. For each elementary speech segment, an elementary recognizer provides a score indicating the degree of correlation of the input speech segment with stored spectral patterns. Each local decision module receives the results of the elementary recognizer and, at the same time, receives an input from selected ones of the other local decision modules. Each local decision module specializes in a particular node in the network wherein each node matches the probability of how well the input segment of speech matches the particular sound segments in the sounds of the words spoken. Each local decision module takes the prior decisions of all preceding sound segments which are input from the other local decision modules and makes a selection of the locally optimum time warping to be permitted. By this selection technique, each speech segment is stretched or compressed by an arbitrary, nonlinear function based on the control of the interconnections of the other local decision modules to a particular local decision module. Each local decision module includes an accumulator memory which stores the logarithmic probabilities of the current observation which is conditional upon the internal event specified by a word to be matched or identifier of the particular pattern that corresponds to the subject node for that particular pattern. For each observation, these probabilities are computed and loaded into the accumulator memory of all the modules and, the result of the locally optimum time warping representing the accumulated score or network path to a node for the word with the highest probability is chosen.
摘要:
A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.
摘要:
Smoothed frame labeling associates phonetic frame labels with a given speech frame as a function of (a) the closeness with which the given frame compares to each of a plurality of acoustic models, (b) which frame labels correspond with a neighboring frame, and (c) transition probabilities which indicate, for the frame labels associated with the neighboring frame, which frame labels are probably associated with the given frame. The smoothed frame labeling is used to divide the speech into segments of frames having the same class of labels. The invention represents words as a collection of known diphone models, each of which models the sound before and after a boundary between segments derived by the smoothed frame labeling. At recognition time, the speech is divided into segments by smoothed frame labeling; diphone models are derived for each boundary between the resulting segments; and the resulting diphone models are compared against the known diphone models to determine which of the known diphone models match the segment boundaries in the speech. Then a combined-displaced-evidence method is used to determine which words occur in the speech. This method detects which acoustic patterns, in the form of the known diphone models, match various portions of the speech. In response to each such match, it associates with the speech an evidence score for each vocabulary word in which that pattern is known to occur. It displaces each such score from the location of its associated matched pattern by the known distance between that pattern and the beginning of the score's word. Then all the evidence scores for a word located in a given portion of the speech are combined to produce a score which indicates the probability of that word starting in that portion of the speech. This score is combined with a score produced by comparing a histogram from a portion of the speech against a histogram of each word. The resulting combined score determines whether a given word should undergo a more detailed comparison against the speech to be recognized.
摘要:
A computer-implemented pattern recognition method, system and program product, the method comprising in one embodiment: creating electronically a linkage between a plurality of models within a classifier module within a pattern recognition system such that any one of said plurality of models may be selected as an active model in a recognition process; creating electronically a null hypothesis between at least one model of said plurality of linked models and at least a second model among said plurality of linked models; accumulating electronically evidence to accept or reject said null hypothesis until sufficient evidence is accumulated to reject said null hypothesis in favor of one of said plurality of linked models or until a stopping criterion is met; and transmitting at least a portion of the electronically accumulated evidence or a summary thereof to accept or reject said null hypothesis to a pattern classifier module.
摘要:
A speech recognition method, system and program product, the method in one embodiment comprising: obtaining input speech data; initiating a first speech recognition search process with at least one hypothesis; initiating a second speech recognition search process with a plurality of hypotheses; obtaining partial results from the second speech recognition search process, where the partial results include an evaluation of at least one hypothesis that the first speech recognition search process has not evaluated at this point in time; and utilizing the partial results to alter the first speech recognition search process.
摘要:
A tutorial instructs how to use a word recognition system, such as one for speech recognition. It specifies a set of allowed response words for each of a plurality of states. It sends messages on how to use the recognizer in certain states, and, in others, presents exercises in which the user is to enter signals representing expected words. It scores each such signal against word models to select which response word corresponds to it, and then advances to a state associated with that selected response. This scoring is performed against a large vocabulary even though only a small number of responses are allowed, and the signal is rejected if too many non-allowed words score better than any allowed word. The system comes with multiple sets of standard signal models; it scores each against a given user's signals, selects the set which scores best, and then performs adaptive and batch training upon that set. Preferably, the tutorial prompts users to enter the words used for training in an environment similar to that of the actual recognizer the tutorial is training them to use. The system will normally simulate the recognition of the prompted word, but will sometimes it will simulate an error. When it does, notifies the user if he fails to correct the error. The recognizer associated with the tutorial allows users to perform adaptive training either on all words, or only on those whose recognition has been corrected or confirmed. The recognizer also uses a context language model which indicates the probability that a given word will be used in the context of other words which precede it in a grouping of text.
摘要:
A speech recognition method and apparatus employ a speech processing circuitry for repetitively deriving from a speech input, at a frame repetition rate, a plurality of acoustic parameters. The acoustic parameters represent the speech input signal for a frame time. A plurality of template matching and cost processing circuitries are connected to a system bus, along with the speech processing circuitry, for determining, or identifying, the speech units in the input speech, by comparing the acoustic parameters with stored template patterns. The apparatus can be expanded by adding more template matching and cost processing circuitry to the bus thereby increasing the speech recognition capacity of the apparatus. Template pattern generation is advantageously aided by using a "joker" word to specify the time boundaries of utterances spoken in isolation, by finding the beginning and ending of an utterance surrounded by silence.
摘要:
A word recognition method and system includes obtaining a first portion of a sentence and a second portion of the sentence. The first portion of the sentence is used to obtain a pointer to respective list of second portions of sentences that are complementary to the first portion of the sentence. A match is determined to one of the second portions of sentences from the list based on information obtained from the second portion of the sentence. An error correction capability for dealing with one word errors in the sentence for checking against the database is also attainable.