-
公开(公告)号:US20250166614A1
公开(公告)日:2025-05-22
申请号:US19034304
申请日:2025-01-22
Applicant: Google LLC
Inventor: Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond
IPC: G10L15/06 , G06N3/0464 , G06N3/09
Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.
-
公开(公告)号:US20230223009A1
公开(公告)日:2023-07-13
申请号:US18187330
申请日:2023-03-21
Applicant: Google LLC
Inventor: Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark
CPC classification number: G10L15/005 , G10L15/16 , G10L15/26 , G06F40/58 , G10L15/063 , G06N3/049
Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.
-
公开(公告)号:US11417322B2
公开(公告)日:2022-08-16
申请号:US16712492
申请日:2019-12-12
Applicant: Google LLC
Inventor: Bhuvana Ramabhadran , Min Ma , Pedro J. Moreno Mengibar , Jesse Emond , Brian E. Roark
Abstract: Methods, systems, and apparatus, including computer programs stored on a computer-readable storage medium, for transliteration for speech recognition training and scoring. In some implementations, language examples are accessed, some of which include words in a first script and words in one or more other scripts. At least portions of some of the language examples are transliterated to the first script to generate a training data set. A language model is generated based on occurrences of the different sequences of words in the training data set in the first script. The language model is used to perform speech recognition for an utterance.
-
公开(公告)号:US11615779B2
公开(公告)日:2023-03-28
申请号:US17152760
申请日:2021-01-19
Applicant: Google LLC
Inventor: Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roark
Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.
-
公开(公告)号:US20220310065A1
公开(公告)日:2022-09-29
申请号:US17655903
申请日:2022-03-22
Applicant: Google LLC
Inventor: Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Gary Wang , Yu Zhang , Jesse Emond
Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.
-
公开(公告)号:US12230249B2
公开(公告)日:2025-02-18
申请号:US17655903
申请日:2022-03-22
Applicant: Google LLC
Inventor: Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Yuan Wang , Yu Zhang , Jesse Emond
Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.
-
公开(公告)号:US20210233510A1
公开(公告)日:2021-07-29
申请号:US17152760
申请日:2021-01-19
Applicant: Google LLC
Inventor: Arindrima Datta , Bhuvana Ramabhadran , Jesse Emond , Brian Roak
Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.
-
-
-
-
-
-