摘要:
The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.
摘要:
The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.
摘要:
The present invention relates to a method and apparatus for adding new vocabulary to interactive translation and dialog systems. In one embodiment, a method for adding a new word to a vocabulary of an interactive dialog includes receiving an input signal that includes at least one word not currently in the vocabulary, inserting the word into a dynamic component of a search graph associated with the vocabulary, and compiling the dynamic component independently of a permanent component of the search graph to produce a new sub-grammar, where the permanent component comprises a plurality of words that are permanently part of the search graph.
摘要:
In one embodiment, the present invention is a method and apparatus for adapting original musical tracks for karaoke use. In one embodiment, an original musical track is separated into vocal elements and non-vocal elements. The vocal elements are aligned with corresponding text transcriptions (e.g., text-based lyrics), and the aligned text-based lyrics are then displayed to a user while the non-vocal elements are simultaneously played in a manner that is synchronous with the display of the lyrics.
摘要:
The present invention is a method and apparatus for reading education. In one embodiment, a method for recognizing an utterance spoken by a reader, includes receiving text to be read by the reader, generating a grammar for speech recognition, in accordance with the text, receiving the utterance, interpreting the utterance in accordance with the grammar, and outputting feedback indicative of reader performance.