Abstract:
Disclosed is a method for entering vocal orders, whereby each command produces a data output for the user, thereby acknowledging or rejecting the command sequence.
Abstract:
Disclosed are a method and an apparatus for adapting, particularly reducing, the size of a language model, which comprises word n-grams, in a speech recognition system. The invention provides a mechanism to discard those n-grams for which the acoustic part of the system requires less support from the language model to recognize correctly. The proposed method is suitable for identifying those trigrams in a language model for the purpose of discarding during the built-time of the system. Provided is also another automatic classification scheme for words which allows the compression of a language model, but under retention of accuracy. Moreover it allows an efficient usage of sparsely available text corpora because even singleton trigrams are used when they are helpful. No additional software tools are needed to be developed because the main tool, the fast match scoring, is a module readily available in the known recognizers themselves. Further improvement of the method is accomplished by classification of words according to the common text in which they occur as far as they distinguish from each other acoustically. The invention opens the possibility to make speech recognition available in low-cost personal computers (PC's), even in portable computers like Laptops.
Abstract:
A speech recogniser in which the recognition vocabulary is generated from a user's own speech by forming phonemic transcriptions of the user's utterances and using these transcriptions for future recognition purposes. The phonemic transcriptions are generated using a loosely constrained network, preferably one constrained only by noise. The resulting transcriptions therefore bear close resemblance to the user's input speech but require significantly reduced storage requirements compared to known speaker dependent word representations.
Abstract:
Boundaries of spoken sound in continuous speech are identified by classifying delimitative sounds to provide improved performance in a speech recognition system (200). Delimitative sounds, those portions of continuous speech that occur between spoken sounds, are recognized by the same method used to recognize spoken sounds. Recognition of delimitative sounds is accomplished by training a learning machine (176) to act as a classifier which implements a discriminant function based on a polynomial expansion.
Abstract:
A path link passing speech recognition system and method for recognising input connected speech, the recognition system having a plurality of vocabulary nodes (24) associated with word representation models, at least one of the vocabulary nodes (24) of the network being able to process more than one path link simultaneously, so allowing for more than one recognition result.
Abstract:
Apparatus for speaker recognition which comprises means (210, 220, 230) for generating, in response to a speech signal, a plurality of feature data comprising a series of coefficient sets, each set comprising a plurality of coefficients indicating the short term spectral amplitude in a plurality of frequency bands, and means (260) for comparing said feature data with predetermined speaker reference data, and for indicating recognition of a corresponding speaker in dependence upon said comparison; characterised in that said frequency bands are unevenly spaced along the frequency axis, and by means (250) for deriving a long term average spectral magnitude of at least one of said coefficients; and for normalising the or each of said at least one coefficient by said long term average.
Abstract:
The present invention relates to a speech synthesis and recognition system that reduces the amount of memory capacity for storing standard speech information, and improves the synthesized speech quality and the rate of speech recognition. The speech synthesis and recognition system has a memory with the stored demiphoneme data bisected with respect to a center of phoneme, and produces a synthesis speech signal by decoding demiphoneme data stored in the memory and concatenating the decoded demiphoneme data while generating a character train data for word, phrase, clause corresponding to speech signal by comparing the demiphoneme data stored in the memory with the speech signal.
Abstract:
Speaker verification is important in such applications as financial transactions which are to be carried out automatically by telephone. False acceptances of a speaker cause serious problems but so do frequent false rejections in view of the annoyance caused. Some of the problems of speaker verification are reduced in the invention by forming Hidden Markov Models (HMMs) for each of a mumber of words using features of utterances of these words from a large number of speakers. These models are known as world models. In addition for every person whose speech is to be recognised, one HMM is formed for each of the words as uttered by that person. These models are known as personal models. In verification a person is prompted to repeat a string of isolated or connected words (15) and features from each of these words are extracted (16). Next the probabilities that these features could have been generated by the world models for these words and by the personal model of that person are calculated, respectively (17 and 18) and these probabilities are compared (19) for each word. A decision (23) on verification is based on a poll (22) of these comparisons.
Abstract:
The present invention describes a method for recognizing alphanumeric strings spoken over a telephone network wherein individual character recognition need not be uniformly high in order to achieve high string recognition accuracy. Preferably, the method uses a processing system (fig. 2) having a digital processor (30), an interface (42) to the telephone network, and a database (45) for storing a predetermined set of reference alphanumeric strings. In operation, the system prompts the caller to speak each character of a string, beginning with a first character and ending with a last character. Each character is then recognized using a speaker-independent voice recognition algorithm. The method (fig. 6) calculates recognition distances between each spoken input character and the corresponding letter of digit in the same position within each reference alphanumeric string (108). After each character is spoken, captured and analyzed (105, 106), each reference string distance is incremented and the process is continued, accumulating distances for each reference string, until the last character is spoken. The reference string with the lowest cumulative distance (112) is then declared to be the recognized string.