Abstract:
Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.
Abstract:
A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or "on-the-fly" based on the currently displayed text (eg the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.
Abstract:
A global speech user interface (GSUI) (100) comprises an input system (110) to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen (140) to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.
Abstract:
A correction device (4) for a speech recognition device (2) is provided, with which the replacement of incorrectly recognized words (FETI) of the recognized text (ETI) is especially simple to execute. The correction device (4) is based on the recognition that the phoneme sequences of incorrectly recognized words and the spoken words actually to be recognized are very similar, and automatically marks words in the recognized text (ETI) which show a phoneme sequence similar to that of a correction word (KWI) put in manually by the user.
Abstract:
The invention relates to a system and a method for the operation and monitoring of, in particular, an automation system and/or a production machine or machine tool, whereby the visual field (9), of a user (1), is recorded on at least one means of display (2), where the speech information (8), from the user (1), is at least intermittently determined and where a visual feedback signal is generated, in response to the processing status, with regard to recognised voice information (8). An improved speech interaction is thus obtained, in particular, in the field of augmented-reality applications and in complex technical plants.
Abstract:
Voice input is received from a user. An ASR system generates in memory a set of words it has identified in the voice input, and update the set each time it identifies a new word in the voice input to add the new word to the set. A condition indicative of speech inactivity in the voice input is detected. A response for outputting to the user is generated based on the set of identified words, in response to the detection of the speech inactivity condition. The generated response is outputted to the user after an interval of time - commencing with the detection of the speech inactivity condition - has ended and only if no more words have been identified in the voice input by the ASR system in that interval of time.
Abstract:
Method and apparatus for providing visual feedback on an electronic device in a client/server speech recognition system comprising the electronic device and a network device remotely located from the electronic device. The method comprises processing, by an embedded speech recognizer of the electronic device, at least a portion of input audio comprising speech to produce local recognized speech, sending at least a portion of the input audio to the network device for remote speech recognition, and displaying, on a user interface of the electronic device, visual feedback based on at least a portion of the local recognized speech prior to receiving streaming recognition results from the network device.
Abstract:
Examples of the present disclosure describe generation of a multi-arc confusion network to improve, for example, an ability to return alternatives to output generated. A confusion network comprising token representations of lexicalized hypotheses and normalized hypotheses is generated. Each arc of the confusion network represents a token of a lexicalized hypothesis or a normalized hypothesis. The confusion network is transformed into a multi-arc confusion network, wherein the transforming comprising realigning at least one token of the confusion network to span multiple arcs of the confusion network. Other examples are also described.