摘要:
A web-based voice dialog interface for use in communicating dialog information between a user at a client machine and one or more servers coupled to the client machine via the Internet or other computer network. The interface in an illustrative embodiment includes a web page interpreter for receiving information relating to one or more web pages. The web page interpreter generates a rendering of at least a portion of the information for presentation to a user in an audibly-perceptible format. A grammar processing device utilizes interpreted web page information received from the web page interpreter to generate syntax information and semantic information. A speech recognizer processes received user speech in accordance with the syntax information, and a natural language interpreter processes the resulting recognized speech in accordance with the semantics information to generate output for delivery to a web server in conjunction with a voice dialog which includes the user speech and the rendering of the web page(s). The output may be processed by a common gateway interface (CGI) formatter prior to delivery to a CGI associated with the web server.
摘要:
A method of recognizing speech input selectively creates and maintains grammar representations of the speech input in essentially real time. Speech input frames are received by a speech recognition system. Grammar representations are created for each speech frame and a probability score is derived for the representations indicating the probability of the accuracy of the representations to the speech input. Representations having a probability score below a predetermined threshold are not maintained. Those grammar representations having probability scores above the predetermined threshold are maintained. As more speech frames are received by the system, additional grammar representations are created and the probability scores are updated. When the entire speech input has been received, the chain of grammar representations having the highest probability score is identified as the speech input.
摘要:
An adaptive endpointer system and method are used in speech recognition applications, such as telephone-based Internet browsers, to determine barge-in events during the processing of speech. The endpointer system includes a signal energy level estimator for estimating signal levels in speech data; a noise energy level estimator for estimating noise levels in the speech data; and a barge-in detector for increasing a threshold used in comparing the signal levels and the noise levels to detect the barge-in event in the speech data corresponding to a speech prompt during speech recognition.
摘要:
A method of recognizing speech input selectively creates and maintains grammar representations of the speech input in essentially real time. Speech input frames are received by a speech recognition system. Grammar representations are created for each speech frame and a probability score is derived for the representations indicating the probability of the accuracy of the representations to the speech input. Representations having a probability score below a predetermined threshold are not maintained. Those grammar representations having probability scores above the predetermined threshold are maintained. As more speech frames are received by the system, additional grammar representations are created and the probability scores are updated. When the entire speech input has been received, the chain of grammar representations having the highest probability score is identified as the speech input.
摘要:
Methods and systems for performing handwriting recognition which include, in part, application of stochastic modeling techniques in conjunction with language modeling. Handwriting recognition is performed on a received data set, which is representative of a handwriting sample comprised of one or more symbols. Recognition is performed by selectively segmenting the data set into one or more strokes utilizing an evolution grammar for identifying each one of the strokes among one or more alternatives. Each one of the strokes represents a segment of the handwriting sample. The identified strokes are evaluated as a stroke sequence, representative of one or more of the handwriting sample's symbols, to identify the handwriting sample.
摘要:
A method of recognizing speech input selectively creates and maintains grammar representations of the speech input in essentially real time. Speech input frames received by a speech recognition system. Grammar representations are created for each speech frame and a probability score is derived for the representations indicating the probability of the accuracy of the representations to the speech input. Representations having a probability score below a predetermined threshold are not maintained. Those grammar representations having probability scores above predetermined threshold are maintained. As more speech frames are received by the system, additional grammar representations are created and the probability scores are updated. When the entire speech input has been received, the chain of grammar representations having the highest probability score is identified as the speech input.