摘要:
A computing device may receive an incoming communication and, in response, generate a notification that indicates that the incoming communication can be accessed using a particular application on the communication device. The computing device may further provide an audio signal indicative of the notification and automatically activate a listening mode. The computing device may receive a voice input during the listening mode, and an input text may be obtained based on speech recognition performed upon the voice input. A command may be detected in the input text. In response to the command, the computing device may generate an output text that is based on at least the notification and provide a voice output that is generated from the output text via speech synthesis. The voice output identifies at least the particular application.
摘要:
A computing device may receive an incoming communication and, in response, generate a notification that indicates that the incoming communication can be accessed using a particular application on the communication device. The computing device may further provide an audio signal indicative of the notification and automatically activate a listening mode. The computing device may receive a voice input during the listening mode, and an input text may be obtained based on speech recognition performed upon the voice input. A command may be detected in the input text. In response to the command, the computing device may generate an output text that is based on at least the notification and provide a voice output that is generated from the output text via speech synthesis. The voice output identifies at least the particular application.
摘要:
A spoken utterance includes at least a first level of a multi-level command format, in which the first level identifies an application. The spoken utterance may also include a second level of the multi-level command format, in which the second level identifies an action. In response to receiving the spoken utterance at a computing device, a representation of the application identified by the first level is displayed on a display of the computing device. If the spoken utterance includes the second level of the multi-level command format, the action identified by the second level is initiated. If the spoken utterance does not include the second level of the multi-level command format, the computing device waits for a predetermined period of time and provides at least one of an audible or visual action prompt if the second level is not received within the predetermined period of time.
摘要:
The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words.
摘要:
The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words.
摘要:
Disclosed are systems, methods, and devices for providing a layered user interface for one or more applications. A user-interface layer for a voice user interface is generated. The user-interface layer can be based on a markup-language-structured user-interface description for an application configured to execute on a computing device. The user-interface layer can include a command display of one or more voice-accessible commands for the application. The computing device can display at least the user-interface layer of the voice user interface. The computing device can receive an input utterance, obtain input text based upon speech recognition performed upon the input utterance, and determine that the input text corresponds to a voice-accessible command displayed as part of the command display. The computing device can execute the application to perform the command.
摘要:
A computing device is configured to initiate actions in response to speech input that includes a name or other indication of an entity, in a first spoken utterance, followed by user choosing an application related to an entity, in a second spoken utterance. The computing device receives the first spoken utterance, identifies an entity based on the first spoke utterance, and indicates a plurality of available applications related to the identified entity. The computing device then receives the second spoken utterance and identifies a selection of at least one of the available applications based on the second spoken utterance. The computing device then invokes the at least one selected application.
摘要:
A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.
摘要:
In general, this disclosure describes techniques to direct textual characters converted from vocal input into selected graphical user interface input fields. Vocal input may be received. Textual characters may be identified based on the vocal input. A first portion of the textual characters corresponding to a first portion of the vocal input may be graphically inputted into a first input field of a GUI. While receiving the vocal input, a selection by of a second input field in the GUI may be accepted after the first portion of the vocal input has been received. After accepting the selection of the second input field, a second portion of the textual characters corresponding to a second portion of the vocal input received after the selection of the second input field may be inputted into the second input field.
摘要:
This specification describes technologies relating to recognition of text in various media. In general, one aspect of the subject matter described in this specification can be embodied in methods that include receiving an input signal including data representing one or more words and passing the input signal to a text recognition system that generates a recognized text string based on the input signal. The methods may further include receiving the recognized text string from the text recognition system. The methods may further include presenting the recognized text string to a user and receiving a corrected text string based on input from the user. The methods may further include checking if an edit distance between the corrected text string and the recognized text string is below a threshold. If the edit distance is below the threshold, the corrected text string may be passed to the text recognition system for training purposes.