摘要:
Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.
摘要:
A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
摘要:
A phoneme and a viseme of a person may be modeled using a coupled hidden Markov model. The coupled hidden Markov model and a second model may be compared to identify the person.
摘要:
According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.
摘要:
According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.