-
111.
公开(公告)号:US20200335083A1
公开(公告)日:2020-10-22
申请号:US16959037
申请日:2019-11-27
Applicant: Google LLC
Inventor: Li Wan , Yang Yu , Prashant Sridhar , Ignacio Lopez Moreno , Quan Wang
IPC: G10L15/00
Abstract: Methods and systems for training and/or using a language selection model for use in determining a particular language of a spoken utterance captured in audio data. Features of the audio data can be processed using the trained language selection model to generate a predicted probability for each of N different languages, and a particular language selected based on the generated probabilities. Speech recognition results for the particular language can be utilized responsive to selecting the particular language of the spoken utterance. Many implementations are directed to training the language selection model utilizing tuple losses in lieu of traditional cross-entropy losses. Training the language selection model utilizing the tuple losses can result in more efficient training and/or can result in a more accurate and/or robust model—thereby mitigating erroneous language selections for spoken utterances.
-
公开(公告)号:US10586542B2
公开(公告)日:2020-03-10
申请号:US15966667
申请日:2018-04-30
Applicant: Google LLC
Inventor: Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno
Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.
-
公开(公告)号:US20190318724A1
公开(公告)日:2019-10-17
申请号:US15973466
申请日:2018-05-07
Applicant: Google LLC
Inventor: Pu-sen Chao , Diego Melendo Casado , Ignacio Lopez Moreno
Abstract: The present disclosure relates generally to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. The system can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
-
公开(公告)号:US10140991B2
公开(公告)日:2018-11-27
申请号:US15624760
申请日:2017-06-16
Applicant: Google LLC
Inventor: Matthew Sharifi , Ignacio Lopez Moreno , Ludwig Schmidt
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, data identifying a media item including speech of a speaker is received. Based on the received data, one or more other media items that include speech of the speaker are identified. One or more search results are generated that each reference a respective media item of the one or more other media items that include speech of the speaker. The one or more search results are provided for display.
-
公开(公告)号:US20180308472A1
公开(公告)日:2018-10-25
申请号:US15956493
申请日:2018-04-18
Applicant: Google LLC
Inventor: Ignacio Lopez Moreno , Diego Melendo Casado
CPC classification number: G10L17/06 , G06F17/30764 , G06F21/32 , G06K9/00362 , G10L15/07 , G10L15/08 , G10L15/22 , G10L15/265 , G10L17/005
Abstract: In some implementations, an utterance is determined to include a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword. In response to determining that an utterance includes a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword, at least a portion of the utterance is stored as a new sample. A second set of samples of the particular user speaking the utterance is obtained, where the second set of samples includes the new sample and less than all the samples in the first set of samples. A second utterance is determined to include the particular user speaking the hotword based at least on the second set of samples of the user speaking the hotword.
-
-
-
-