Abstract:
A computing system has multiple endpoint computing devices in local environments to receive verbal requests from various users and a central or remote system to process the requests. The remote system generates responses and uses a variety of techniques to determine where and when to return responses audibly to the users. For each request, the remote system understands who is making the request, determines when to provide the response to the user, ascertains where the user is when it is time to deliver the response, discovers which of the endpoint devices are available to deliver the response, and evaluates which of the available devices is best to deliver the response. The system then delivers the response to the best endpoint device for audible emission or other form of presentation to the user.
Abstract:
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.
Abstract:
Techniques for using both speaker-identification information and other characteristics associated with received voice commands to determine how and whether to respond to the received voice commands. A user may interact with a device through speech by providing voice commands. After beginning an interaction with the user, the device may detect subsequent speech, which may originate from the user, from another user, or from another source. The device may then use speaker-identification information and other characteristics associated with the speech to attempt to determine whether or not the user interacting with the device uttered the speech. The device may then interpret the speech as a valid voice command and may perform a corresponding operation in response to determining that the user did indeed utter the speech. If the device determines that the user did not utter the speech, however, then the device may refrain from taking action on the speech.
Abstract:
Techniques for identifying a location of a voice-controlled device within an environment. After identifying a location of the device, the device may receive a voice command from a user within the environment and may determine a response to the command based in part on the location, may determine how to output a response based in part on the location or may determine how to interact with the user based in part on the location.
Abstract:
A video display hub is mounted in a common household area such as a kitchen or family room. During times that have been designated as being available for communications, devices in first and second households exchange and display blurred video, allowing users in each household to see vague shapes and movements of the other household. Upon noticing activity, a user in the first household may initiate a video conversation, causing the video from the first household to be unblurred and causing unobscured voice to be transmitted to the second household. A user in the second household may respond by allowing the video conversation to be fully enabled, allowing the video from the second household to be unblurred and unobscured voice to be transmitted back to the first household.
Abstract:
A video display hub is mounted in a common household area such as a kitchen or family room. The display hub is configured to display various types of information for users in the area, such as weather, traffic updates, schedules, notes, messages, lists, news, etc. When the user is at a distance from the display hub, information is presented at a relatively low density, with a low level of granularity and detail in conjunction with large fonts, graphics, and icons. When the user is close to the display hub, information is presented at a relatively high density, with a high level of granularity and detail in conjunction with small fonts, graphics, and icons.