摘要:
Correcting incorrect text associated with recognition errors in computer-implemented speech recognition includes receiving a selection of a word from a recognized utterance. The selection indicates a bound of a portion of the recognized utterance to be corrected. A first recognition correction is produced based on a comparison between a first alternative transcript and the recognized utterance. A second recognition correction is produced based on a comparison between a second alternative transcript and the recognized utterance. The duration of the first recognition correction differs from the duration of the second recognition correction. A portion of the recognition result that is replaced with one of the first recognition correction and the second recognition correction. includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.
摘要:
New techniques and systems may be implemented to improve error correction in speech recognition. These new techniques and systems may be implemented to correct errors in speech recognition systems may be used in a standard desktop environment, in a mobile environment, or in any other type of environment that can receive and/or present recognized speech.
摘要:
New techniques and systems may be implemented to improve error correction in speech recognition. These new techniques and systems may be implemented to correct errors in speech recognition systems may be used in a standard desktop environment, in a mobile environment, or in any other type of environment that can receive and/or present recognized speech.
摘要:
A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.