摘要:
A system is disclosed for recognizing a pattern in a collection of data given a context of one or more other patterns previously identified. Preferably the system is a speech recognition system, the patterns are words and the collection of data is a sequence of acoustic frames. During the processing of each of a plurality of frames, for each word in an active vocabulary, the system updates a likelihood score representing a probability of a match between the word and the frame, combines a language model score based on one or more previously recognized words with that likelihood score, and prunes the word from the active vocabulary if the combined score is below a threshold. A rapid match is made between the frames and each word of an initial vocabulary to determine which words should originally be placed in the active vocabulary. Preferably the system enables an operator to confirm the system's best guess as to the spoken word merely by speaking another word, to indicate that an alternate guess by the system is correct by typing a key associated with that guess, and to indicate that neither the best guess nor the alternate guesses was correct by typing yet another key. The system includes other features, including ones for determining where among the frames to look for the start of speech, and a special hardware processor for computing likelihood scores.
摘要:
A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.
摘要:
A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.
摘要:
A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.
摘要:
A continuous speech prefiltering system for use in continuous speech recognition computer systems. The speech to be recognized is converted from utterances to frame data sets, which frame data sets are smoothed to generate a smooth frame model over a predetermined number of frames. A resident vocabulary is stored within the computer as clusters of word models which are acoustically similar over a succession of frame periods. A cluster score is generated by the system, which score includes the likelihood of the smooth frames evaluated using a probability model for the cluster against which the smooth frame model is being compared. Cluster sets having cluster scores below a predetermined acoustic threshold are removed from further consideration. The remaining cluster sets are unpacked for determination of a word score for each unpacked word. These word scores are used to identify those words which are above a second predetermined threshold to define a word list which is sent to a recognizer for a more lengthy word match. A controller enables the system to initialize times corresponding to the frame start time for each frame data set, defining a sliding window.
摘要:
The system of the invention relates to continuous speech pre-filtering systems for use in discrete and continuous speech recognition computer systems. The speech to be recognized is converted from utterances to frame data sets, which frame data sets are smoothed to generate a smooth frame model over a predetermined number of frames. A resident vocabulary is stored within the computer as clusters of word models which are acoustically similar over a succession of frame periods. A cluster score is generated by the system, which score includes the likelihood of the smooth frames evaluated using a probability model for the cluster against which the smooth frame model is being compared. Cluster sets having cluster scores below a predetermined acoustic threshold are removed from further consideration. The remaining cluster sets are unpacked for determination of a word score for each unpacked word. These word scores are used to identify those words which are above a second predetermined threshold to define a word list which is sent to a recognizer for a more lengthy word match. Control means enable the system to initialize times corresponding to the frame start time for each frame data set, defining a sliding window.
摘要:
A process for producing any desired Ba/Ti mixture to be formulated as an amorphous solid which crystallizes at very low temperatures to yield a desired phase or phases is disclosed. The process yields products free of undesirable impurities and allows macroscopic production of certain phases in the baria-titania system, having exceptional high frequency dielectric properties, that were previously unattainable through solid-state high temperature production techniques.