摘要:
Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.
摘要:
A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
摘要:
A phoneme and a viseme of a person may be modeled using a coupled hidden Markov model. The coupled hidden Markov model and a second model may be compared to identify the person.
摘要:
An automatic speech recognition system comprising a speech decoder to resolve phone and word level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.
摘要:
An automatic speech recognition system comprising a speech decoder to resolve phone and world level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.
摘要:
An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.
摘要:
An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.
摘要:
An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.
摘要:
An automatic speech recognition system for continuous speech recognition of vocabulary words for an autoattendent system proving hand-free telephone calling and utilizing a vocabulary comprising numbers or names of people to be called using known techniques for automatic speech recognition models of word sequencing resulting in high confidence levels of recognition.