Methods and systems for attending to a presenting user

    公开(公告)号:US11086597B2

    公开(公告)日:2021-08-10

    申请号:US16758144

    申请日:2018-08-14

    Applicant: GOOGLE LLC

    Abstract: The various implementations described herein include methods, devices, and systems for attending to a presenting user. In one aspect, a method is performed at an electronic device that includes an image sensor, microphones, a display, processor(s), and memory. The device (1) obtains audio signals by concurrently receiving audio data at each microphone; (2) determines based on the obtained audio signals that a person is speaking in a vicinity of the device; (3) obtains video data from the image sensor; (4) determines via the video data that the person is not within a field of view of the image sensor; (5) reorients the electronic device based on differences in the received audio data; (6) after reorienting the electronic device, obtains second video data from the image sensor and determines that the person is within the field of view; and (7) attends to the person by directing the display toward the person.

    Selective detection of visual cues for automated assistants

    公开(公告)号:US11023051B2

    公开(公告)日:2021-06-01

    申请号:US16617592

    申请日:2018-05-04

    Applicant: Google LLC

    Abstract: Techniques are described herein for reducing false positives in vision sensor-equipped assistant devices. In various implementations, initial image frame(s) may be obtained from vision sensor(s) of an assistant device and analyzed to classify a particular region of the initial image frames as being likely to contain visual noise. Subsequent image frame(s) obtained from the vision sensor(s) may then be analyzed to detect actionable user-provided visual cue(s), in a manner that reduces or eliminates false positives. In some implementations, no analysis may be performed on the particular region of the subsequent image frame(s). Additionally or alternatively, in some implementations, a first candidate visual cue detected within the particular region may be weighted less heavily than a second candidate visual cue detected elsewhere in the one or more subsequent image frames. An automated assistant may then take responsive action based on the detected actionable visual cue(s).

    INVOKING AUTOMATED ASSISTANT FUNCTION(S) BASED ON DETECTED GESTURE AND GAZE

    公开(公告)号:US20200341546A1

    公开(公告)日:2020-10-29

    申请号:US16606529

    申请日:2018-05-04

    Applicant: Google LLC

    Abstract: Invoking one or more previously dormant functions of an automated assistant in response to detecting, based on processing of vision data from one or more vision components: (1) a particular gesture (e.g., of one or more “invocation gestures”) of a user; and/or (2) detecting that a gaze of the user is directed at an assistant device that provides an automated assistant interface (graphical and/or audible) of the automated assistant. For example, the previously dormant function(s) can be invoked in response to detecting the particular gesture, detecting that the gaze of the user is directed at an assistant device for at least a threshold amount of time, and optionally that the particular gesture and the directed gaze of the user co-occur or occur within a threshold temporal proximity of one another.

    Server-provided visual output at a voice interface device

    公开(公告)号:US10339769B2

    公开(公告)日:2019-07-02

    申请号:US15815646

    申请日:2017-11-16

    Applicant: GOOGLE LLC

    Abstract: A method at an electronic device with an array of indicator lights includes: obtaining first visual output instructions stored at the electronic device, where the first visual output instructions control operation of the array of indicator lights based on operating state of the electronic device; receiving a voice input; obtaining from a remote system a response to the voice input and second visual output instructions, where the second visual output instructions are provided by the remote system along with the response in accordance with a determination that the voice input satisfies one or more criteria; executing the response; and displaying visual output on the array of indicator lights in accordance with the second visual output instructions, where otherwise in absence of the second visual output instructions the electronic device displays visual output on the array of indicator lights in accordance with the first visual output instructions.

    Server-Provided Visual Output at a Voice Interface Device

    公开(公告)号:US20180144590A1

    公开(公告)日:2018-05-24

    申请号:US15815646

    申请日:2017-11-16

    Applicant: GOOGLE LLC

    Abstract: A method at an electronic device with an array of indicator lights includes: obtaining first visual output instructions stored at the electronic device, where the first visual output instructions control operation of the array of indicator lights based on operating state of the electronic device; receiving a voice input; obtaining from a remote system a response to the voice input and second visual output instructions, where the second visual output instructions are provided by the remote system along with the response in accordance with a determination that the voice input satisfies one or more criteria; executing the response; and displaying visual output on the array of indicator lights in accordance with the second visual output instructions, where otherwise in absence of the second visual output instructions the electronic device displays visual output on the array of indicator lights in accordance with the first visual output instructions.

    DETECTION AND/OR ENROLLMENT OF HOT COMMANDS TO TRIGGER RESPONSIVE ACTION BY AUTOMATED ASSISTANT

    公开(公告)号:US20250140240A1

    公开(公告)日:2025-05-01

    申请号:US19008048

    申请日:2025-01-02

    Applicant: GOOGLE LLC

    Abstract: Techniques are described herein for detecting and/or enrolling (or commissioning) new “hot commands” that are useable to cause an automated assistant to perform responsive action(s) without having to be first explicitly invoked. In various implementations, an automated assistant may be transitioned from a limited listening state into a full speech recognition state in response to a trigger event. While in the full speech recognition state, the automated assistant may receive and perform speech recognition processing on a spoken command from a user to generate a textual command. The textual command may be determined to satisfy a frequency threshold in a corpus of textual commands. Consequently, data indicative of the textual command may be enrolled as a hot command. Subsequent utterance of another textual command that is semantically consistent with the textual command may trigger performance of a responsive action by the automated assistant, without requiring explicit invocation.

    Methods and systems for attending to a presenting user

    公开(公告)号:US11789697B2

    公开(公告)日:2023-10-17

    申请号:US17370656

    申请日:2021-07-08

    Applicant: GOOGLE LLC

    Abstract: The various implementations described herein include methods, devices, and systems for attending to a presenting user. In one aspect, a method is performed at an electronic device that includes an image sensor, microphones, a display, processor(s), and memory. The device (1) obtains audio signals by concurrently receiving audio data at each microphone; (2) determines based on the obtained audio signals that a person is speaking in a vicinity of the device; (3) obtains video data from the image sensor; (4) determines via the video data that the person is not within a field of view of the image sensor; (5) reorients the electronic device based on differences in the received audio data; (6) after reorienting the electronic device, obtains second video data from the image sensor and determines that the person is within the field of view; and (7) attends to the person by directing the display toward the person.

Patent Agency Ranking