-
公开(公告)号:US20230059469A1
公开(公告)日:2023-02-23
申请号:US17982834
申请日:2022-11-08
Applicant: GOOGLE LLC
Inventor: Matthew Sharifi , Victor Carbune
Abstract: Implementations can receive audio data corresponding to a spoken utterance of a user, process the audio data to generate a plurality of speech hypotheses, determine an action to be performed by an automated assistant based on the speech hypotheses, and cause the computing device to render an indication of the action. In response to the computing device rendering the indication, implementations can receive additional audio data corresponding to an additional spoken utterance of the user, process the additional audio data to determine that a portion of the spoken utterance is similar to an additional portion of the additional spoken utterance, supplant the action with an alternate action, and cause the automated assistant to initiate performance of the alternate action. Some implementations can determine whether to render the indication of the action based on a confidence level associated with the action.
-
公开(公告)号:US20230055608A1
公开(公告)日:2023-02-23
申请号:US17982863
申请日:2022-11-08
Applicant: GOOGLE LLC
Inventor: Matthew Sharifi , Victor Carbune
Abstract: Some implementations relate to performing speech biasing, NLU biasing, and/or other biasing based on historical assistant interaction(s). It can be determined, for one or more given historical interactions of a given user, whether to affect future biasing for (1) the given user account, (2) additional user account(s), and/or (3) the shared assistant device as a whole. Some implementations disclosed herein additionally and/or alternatively relate to: determining, based on utterance(s) of a given user to a shared assistant device, an association of first data and second data; storing the association as accessible to a given user account of the given user; and determining whether to store the association as also accessible by additional user account(s) and/or the shared assistant device.
-
公开(公告)号:US11557293B2
公开(公告)日:2023-01-17
申请号:US17321994
申请日:2021-05-17
Applicant: GOOGLE LLC
Inventor: Victor Carbune , Matthew Sharifi , Ondrej Skopek , Justin Lu , Daniel Valcarce , Kevin Kilgour , Mohamad Hassan Rom , Nicolo D'Ercole , Michael Golikov
Abstract: Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
-
公开(公告)号:US11514109B2
公开(公告)日:2022-11-29
申请号:US17083613
申请日:2020-10-29
Applicant: Google LLC
Inventor: Matthew Sharifi , Victor Carbune
IPC: G06F15/16 , G06F16/9032 , G16Y10/80 , G16Y40/35 , G10L15/30
Abstract: Implementations can identify a given assistant device from among a plurality of assistant devices in an ecosystem, obtain device-specific signal(s) that are generated by the given assistant device, process the device-specific signal(s) to generate candidate semantic label(s) for the given assistant device, select a given semantic label for the given semantic device from among the candidate semantic label(s), and assigning, in a device topology representation of the ecosystem, the given semantic label to the given assistant device. Implementations can optionally receive a spoken utterance that includes a query or command at the assistant device(s), determine a semantic property of the query or command matches the given semantic label to the given assistant device, and cause the given assistant device to satisfy the query or command.
-
公开(公告)号:US20220355814A1
公开(公告)日:2022-11-10
申请号:US17273673
申请日:2020-11-18
Applicant: GOOGLE LLC
Inventor: Matthew Sharifi , Victor Carbune
Abstract: To identify driving event sounds during navigation, a client device in a vehicle provides a set of navigation directions for traversing from a starting location to a destination location along a route. During navigation to the destination location, the client device identifies audio that includes a driving event sound from within the vehicle or an area surrounding the vehicle. In response to determining that the audio includes the driving event sound, the client device determines whether the driving event sound is artificial. In response to determining that the driving event sound is artificial, the client device presents a notification to the driver indicating that the driving event sound is artificial or masks the driving event sound to prevent the driver from hearing the driving event sound.
-
公开(公告)号:US11462219B2
公开(公告)日:2022-10-04
申请号:US17086296
申请日:2020-10-30
Applicant: Google LLC
Inventor: Matthew Sharifi , Victor Carbune
IPC: G10L15/00 , G10L15/22 , G10L15/02 , G10L21/0208 , G10L25/78 , G10L25/87 , G10L21/0272
Abstract: A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.
-
公开(公告)号:US20220262345A1
公开(公告)日:2022-08-18
申请号:US17662021
申请日:2022-05-04
Applicant: Google LLC
Inventor: Matthew Sharifi , Kevin Kilgour , Dominik Roblek , James Lin
Abstract: A method of training a custom hotword model includes receiving a first set of training audio samples. The method also includes generating, using a speech embedding model configured to receive the first set of training audio samples as input, a corresponding hotword embedding representative of a custom hotword for each training audio sample of the first set of training audio samples. The speech embedding model is pre-trained on a different set of training audio samples with a greater number of training audio samples than the first set of training audio samples The method further includes training the custom hotword model to detect a presence of the custom hotword in audio data. The custom hotword model is configured to receive, as input, each corresponding hotword embedding and to classify, as output, each corresponding hotword embedding as corresponding to the custom hotword.
-
公开(公告)号:US11415427B2
公开(公告)日:2022-08-16
申请号:US16852982
申请日:2020-04-20
Applicant: Google LLC
Inventor: Matthew Sharifi , Jakob Foerster
Abstract: Systems and methods for generating return journey notifications include obtaining a request for navigational directions to a target destination. An outbound journey route from an initial location to the target destination can be determined, wherein the outbound journey route includes an estimated outbound journey time. A return journey route from the target destination to a return destination can be determined, wherein the return journey route includes an estimated return journey time. The outbound journey route and/or return journey route can be determined at least in part from one or more of current traffic conditions or historical traffic conditions. One or more notifications regarding the return journey route can be generated when comparing the estimated outbound journey time to the estimated return journey time results in a determination that one or more predetermined criteria are met.
-
公开(公告)号:US11412065B2
公开(公告)日:2022-08-09
申请号:US16436840
申请日:2019-06-10
Applicant: Google LLC
Inventor: Jakob Foerster , Matthew Sharifi
IPC: H04L67/306 , H04L51/10 , H04L51/52 , H04L65/612 , H04L65/60 , H04L67/10 , H04N21/25 , H04N21/258 , H04N21/262 , H04N21/4788 , H04L51/222
Abstract: A first user device executing a messaging application receives a first message from a second user device. The first user device that is associated with a user determines whether the first message includes a first reference to a first media item. Responsive to determining that the first message includes the first reference to the first media item, media playlist information identifying the first media item is generated. The media playlist information identifying the first media item is sent to a content sharing platform. The first media item is to be added to a media playlist maintained by the content sharing platform.
-
公开(公告)号:US20220189465A1
公开(公告)日:2022-06-16
申请号:US17117799
申请日:2020-12-10
Applicant: Google LLC
Inventor: Matthew Sharifi , Victor Carbune
Abstract: A method includes receiving audio data corresponding to an utterance spoken by a user that includes a command for a digital assistant to perform a long-standing operation, activating a set of one or more warm words associated with a respective action for controlling the long-standing operation, and associating the activated set of one or more warm words with only the user. While the digital assistant is performing the long-standing operation, the method includes receiving additional audio data corresponding to an additional utterance, identifying one of the warm words from the activated set of warm words, and performing speaker verification on the additional audio data. The method further includes performing the respective action associated with the identified one of the warm words for controlling the long-standing operation when the additional utterance was spoken by the same user that is associated with the activated set of one or more warm words.
-
-
-
-
-
-
-
-
-