Abstract:
The disclosed embodiments include a computer implemented method to control the presentation of an audio video stream. The method includes obtaining an audio video stream and associating the audio video stream with events. The events include an interpretation of content of the audio video stream. The method further includes obtaining a natural language command, generating a control signal based on the natural language command by referencing a particular event, and using the control signal to control presentation of the audio video stream relative to the particular event.
Abstract:
A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.
Abstract:
The disclosed embodiments include a computer implemented method to control the presentation of an audio video stream. The method includes obtaining an audio video stream and associating the audio video stream with events. The events include an interpretation of content of the audio video stream. The method further includes obtaining a natural language command, generating a control signal based on the natural language command by referencing a particular event, and using the control signal to control presentation of the audio video stream relative to the particular event.
Abstract:
A system (100) for enabling a user to select media content in an entertainment environment, comprising a remote control device (110) having a set of user-activated keys and a speech activation circuit adapted to enable a speech signal; a speech engine (160) comprising a speech recognizer (170); an application wrapper (180) configured to recognize substantive meaning in the speech signal; and a media content controller (190) configured to select media content. Every function that can be executed by activation of the user-activated keys can also be executed by the speech engine (160) in response to the recognized substantive meaning.
Abstract:
A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.
Abstract:
A global speech user interface (GSUI) comprises an input system to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.
Abstract:
A method and apparatus to identify names, personalities, titles, and topics that are present in a repository and to identify names, personalities, titles, and topics that are not present in the repository, uses information from external data sources, notably the text used in non-speech, text-based searches, to expand the search terms. The expansion takes place in two forms: (1) finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are present under slightly different names; and (2) expanding the existing search term list with items that should be there by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
Abstract:
A global speech user interface (GSUI) comprises an input system to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.
Abstract:
Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question, e.g. words that are not part of the proper name entities, may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.
Abstract:
A global speech user interface (GSUI) comprises an input system to receive a user's spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.