Abstract:
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.
Abstract:
A system that is capable of controlling multiple entertainment systems and/or speakers using voice commands. The system receives voice commands and may determine audio sources and speakers indicated by the voice commands. The system may generate audio data from the audio sources and may send the audio data to the speakers using multiple interfaces. For example, the system may send the audio data directly to the speakers using a network address, may send the audio data to the speakers via a voice-enabled device or may send the audio data to the speakers via a speaker controller. The system may generate output zones including multiple speakers and may associate input devices with speakers within the output zones. For example, the system may receive a voice command from an input device in an output zone and may reduce output audio generated by speakers in the output zone.
Abstract:
An interactive system may be implemented in part by an audio device located within a user environment, which may accept speech commands from a user and may also interact with the user by means of generated speech. In order to improve performance of the interactive system, a user may use a separate device, such as a personal computer or mobile device, to access a graphical user interface that lists details of historical speech interactions. The graphical user interface may be configured to allow the user to provide feedback and/or corrections regarding the details of specific interactions.
Abstract:
Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user's subsequent command.
Abstract:
A system capable of connecting a device to a Public Switched Telephone Network (PSTN) using an adapter. The adapter may connect to the PSTN and send audio data between the PSTN and a server via a data network, enabling the device to communicate over the PSTN. In addition, the adapter enables a telephone connected to a home telephone circuit to perform voice commands, by sending audio data from the telephone to the server and the server determining voice commands included in the audio data. In addition, the system may enable additional functionality for the home telephone circuit, such as three way calling, avoiding charge calls, detecting outgoing alarm signals and triggering an alarm response, monitoring call statistics of telephone calls and sending intercom signals to telephones connected to the home telephone circuit.
Abstract:
The Design consists of the features of shape, ornamentation and configuration of the combined voice controlled visual display and automation device shown in the drawings. Features shown by stippled lines are part of the design article, but not part of the design. Figure 1 is a first perspective view from below of the combined voice controlled visual display and automation device according to the design; Figure 2 is a second perspective view from above;Figure 3 is a front view;Figure 4 is a back view;Figure 5 is a left-side view;Figure 6 is a right-side view; Figure 7 is a top view; andFigure 8 is a bottom view.
Abstract:
Techniques for using both speaker-identification information and other characteristics associated with received voice commands to determine how and whether to respond to the received voice commands. A user may interact with a device through speech by providing voice commands. After beginning an interaction with the user, the device may detect subsequent speech, which may originate from the user, from another user, or from another source. The device may then use speaker-identification information and other characteristics associated with the speech to attempt to determine whether or not the user interacting with the device uttered the speech. The device may then interpret the speech as a valid voice command and may perform a corresponding operation in response to determining that the user did indeed utter the speech. If the device determines that the user did not utter the speech, however, then the device may refrain from taking action on the speech.