Patent search ap:("Google LLC") AND inv:"Ignacio Lopez Moreno" Page 1

1.

发明授权
Identification and utilization of misrecognitions in automatic speech recognition 有权

公开(公告)号：US12165628B2

公开(公告)日：2024-12-10

申请号：US17251284

申请日：2020-07-08

Applicant: Google LLC

Inventor： Ágoston Weisz , Ignacio Lopez Moreno , Alexandru Dovlecel

IPC: G10L15/06 , G10L15/02 , G10L15/22

Abstract: Techniques are disclosed that enable determining and/or utilizing a misrecognition of a spoken utterance, where the misrecognition is generated using an automatic speech recognition (ASR) model. Various implementations include determining a recognition based on the spoken utterance and a previous utterance spoken prior to the spoken utterance. Additionally or alternatively, implementations include personalizing an ASR engine for a user based on the spoken utterance and the previous utterance spoken prior to the spoken utterance (e.g., based on audio data capturing the previous utterance and a text representation of the spoken utterance).

2.

发明授权
Neural networks for speaker verification 有权

公开(公告)号：US11961525B2

公开(公告)日：2024-04-16

申请号：US17444384

申请日：2021-08-03

Applicant: Google LLC

Inventor： Georg Heigold , Samuel Bengio , Ignacio Lopez Moreno

IPC: G10L17/18 , G10L17/02 , G10L17/04

CPC classification number: G10L17/18 , G10L17/02 , G10L17/04

Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

3.

发明授权
Recognizing speech in the presence of additional audio 有权

公开(公告)号：US11942083B2

公开(公告)日：2024-03-26

申请号：US17303139

申请日：2021-05-21

Applicant: Google LLC

Inventor： Diego Melendo Casado , Ignacio Lopez Moreno , Javier Gonzalez-Dominguez

IPC: G10L15/00 , G06F3/16 , G10L15/20 , G10L15/22 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/30 , G10L15/26 , G10L17/00

CPC classification number: G10L15/20 , G06F3/165 , G06F3/167 , G10L15/222 , G10L17/06 , G10L21/034 , G10L25/84 , H03G3/3005 , G10L15/26 , G10L17/00

Abstract: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

4.

发明公开
ATTENTIVE SCORING FUNCTION FOR SPEAKER IDENTIFICATION 审中-公开

公开(公告)号：US20240029742A1

公开(公告)日：2024-01-25

申请号：US18479615

申请日：2023-10-02

Applicant: Google LLC

Inventor： Ignacio Lopez Moreno , Quan Wang , Jason Pelecanos , Yiling Huang , Mert Saglam

IPC: G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18

CPC classification number: G10L17/06 , G06F16/245 , G06N3/08 , G10L17/04 , G10L17/18

Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

5.

发明授权
Multi-user authentication on a device 有权

公开(公告)号：US11727918B2

公开(公告)日：2023-08-15

申请号：US17375573

申请日：2021-07-14

Applicant: GOOGLE LLC

Inventor： Ignacio Lopez Moreno , Diego Melendo Casado

IPC: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06V40/10 , G10L15/07 , G10L15/26

CPC classification number: G10L15/08 , G06F16/636 , G06F21/32 , G06V40/10 , G10L15/07 , G10L15/22 , G10L17/00 , G10L17/06 , G10L15/26 , G10L2015/088

Abstract: In some implementations, a set of audio recordings capturing utterances of a user is received by a first speech-enabled device. Based on the set of audio recordings, the first speech-enabled device generates a first user voice recognition model for use in subsequently recognizing a voice of the user at the first speech-enabled device. Further, a particular user account associated with the first voice recognition model is determined, and an indication that a second speech-enabled device that is associated with the particular user account is received. In response to receiving the indication, the set of audio recordings is provided to the second speech-enabled device. Based on the set of audio recordings, the second speech-enabled device generates a second user voice recognition model for use in subsequently recognizing the voice of the user at the second speech-enabled device.

6.

发明申请
TEXT INDEPENDENT SPEAKER RECOGNITION 有权

公开(公告)号：US20230113617A1

公开(公告)日：2023-04-13

申请号：US18078476

申请日：2022-12-09

Applicant: GOOGLE LLC

Inventor： Pu-sen Chao , Diego Melendo Casado , Ignacio Lopez Moreno , Quan Wang

IPC: G10L15/06 , G10L15/07 , G10L15/22 , G10L15/32 , G10L17/24

Abstract: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

7.

发明授权
Text independent speaker recognition 有权

公开(公告)号：US11527235B2

公开(公告)日：2022-12-13

申请号：US17046994

申请日：2019-12-02

Applicant: Google LLC

Inventor： Pu-sen Chao , Diego Melendo Casado , Ignacio Lopez Moreno , Quan Wang

IPC: G10L15/06 , G10L15/07 , G10L15/22 , G10L15/32 , G10L17/24

Abstract: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

8.

发明申请
NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING 有权

公开(公告)号：US20220284891A1

公开(公告)日：2022-09-08

申请号：US17190779

申请日：2021-03-03

Applicant: GOOGLE LLC

Inventor： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC: G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208

Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

9.

发明申请
TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING 有权

公开(公告)号：US20220122611A1

公开(公告)日：2022-04-21

申请号：US17567590

申请日：2022-01-03

Applicant: GOOGLE LLC

Inventor： Quan Wang , Prashant Sridhar , Ignacio Lopez Moreno , Hannah Muckenhirn

IPC: G10L17/04 , G10L17/22 , G10L25/18 , G10L17/02 , G10L17/18 , G10L17/00

Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

10.

发明授权
Multi-user authentication on a device 有权

公开(公告)号：US11238848B2

公开(公告)日：2022-02-01

申请号：US16709132

申请日：2019-12-10

Applicant: Google LLC

Inventor： Meltem Oktem , Taral Pradeep Joglekar , Fnu Heryandi , Pu-sen Chao , Ignacio Lopez Moreno , Salil Rajadhyaksha , Alexander H. Gruenstein , Diego Melendo Casado

IPC: G10L15/08 , G06F21/32 , G10L17/06 , G06F16/635 , G10L15/22 , G10L17/00 , G06K9/00 , G10L15/07 , G10L15/26

Abstract: In some implementations, authentication tokens corresponding to known users of a device are stored on the device. An utterance from a speaker is received. The speaker of the utterance is classified as not a known user of the device. A query that includes the authentication tokens that correspond to known users of the device, a representation of the utterance, and an indication that the speaker was classified as not a known user of the device is provided to the server. A response to the query is received at the device and from the server based on the query.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification