Patent search ap:("Google LLC") AND inv:"Jesse Emond" Page 1

1.

发明申请
Supervised and Unsupervised Training with Contrastive Loss Over Sequences 有权

公开(公告)号：US20250166614A1

公开(公告)日：2025-05-22

申请号：US19034304

申请日：2025-01-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G06N3/0464 , G06N3/09

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

2.

发明公开
Language-agnostic Multilingual Modeling Using Effective Script Normalization 审中-公开

公开(公告)号：US20230223009A1

公开(公告)日：2023-07-13

申请号：US18187330

申请日：2023-03-21

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark

IPC: G10L15/00 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/06

CPC classification number: G10L15/005 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/063 , G06N3/049

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

3.

发明授权
Transliteration for speech recognition training and scoring 有权

公开(公告)号：US11417322B2

公开(公告)日：2022-08-16

申请号：US16712492

申请日：2019-12-12

Applicant: Google LLC

Inventor： Bhuvana Ramabhadran , Min Ma , Pedro J. Moreno Mengibar , Jesse Emond , Brian E. Roark

IPC: G10L15/19 , G10L15/06 , G10L15/16 , G10L15/22 , G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs stored on a computer-readable storage medium, for transliteration for speech recognition training and scoring. In some implementations, language examples are accessed, some of which include words in a first script and words in one or more other scripts. At least portions of some of the language examples are transliterated to the first script to generate a training data set. A language model is generated based on occurrences of the different sequences of words in the training data set in the first script. The language model is used to perform speech recognition for an utterance.

4.

发明授权
Language-agnostic multilingual modeling using effective script normalization 有权

公开(公告)号：US11615779B2

公开(公告)日：2023-03-28

申请号：US17152760

申请日：2021-01-19

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark

IPC: G10L15/00 , G06F40/58 , G06N3/04 , G10L15/06 , G10L15/16 , G10L15/26 , G06N3/049

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

5.

发明申请
Supervised and Unsupervised Training with Contrastive Loss Over Sequences 有权

公开(公告)号：US20220310065A1

公开(公告)日：2022-09-29

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Gary Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L15/16 , G10L15/22 , G10L13/02 , G06N3/08

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

6.

发明授权
Supervised and unsupervised training with contrastive loss over sequences 有权

公开(公告)号：US12230249B2

公开(公告)日：2025-02-18

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L13/02 , G10L15/16 , G10L15/22

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

7.

发明申请
Language-agnostic Multilingual Modeling Using Effective Script Normalization 有权

公开(公告)号：US20210233510A1

公开(公告)日：2021-07-29

申请号：US17152760

申请日：2021-01-19

Applicant: Google LLC

Inventor： Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roak

IPC: G10L15/00 , G06N3/04 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/06

Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification