Patent search ap:("Google LLC") AND inv:"Gary Wang" Page 1

1.

发明授权
Using non-parallel voice conversion for speech conversion models 有权

公开(公告)号：US12190862B2

公开(公告)日：2025-01-07

申请号：US17660487

申请日：2022-04-25

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L13/02 , G10L15/197 , G10L15/22 , G10L19/038 , G10L21/003 , G10L15/16 , G10L19/00

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

2.

发明申请
USING NON-PARALLEL VOICE CONVERSION FOR SPEECH CONVERSION MODELS 有权

公开(公告)号：US20250095639A1

公开(公告)日：2025-03-20

申请号：US18962686

申请日：2024-11-27

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L13/02 , G10L15/16 , G10L15/197 , G10L15/22 , G10L19/00 , G10L19/038 , G10L21/003

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

3.

发明公开
USING TEXT-INJECTION TO RECOGNIZE SPEECH WITHOUT TRANSCRIPTION 审中-公开

公开(公告)号：US20240304178A1

公开(公告)日：2024-09-12

申请号：US18439630

申请日：2024-02-12

Applicant: Google LLC

Inventor： Andrew M Rosenberg , Yacob Yochai Blau , Bhuvana Ramabhadran , Genady Beryozkin , Gary Wang , Zhehuai Chen , Rohan Agrawal , Parisa Haghani

IPC: G10L15/06 , G10L15/22 , G10L15/26

CPC classification number: G10L15/063 , G10L15/22 , G10L15/26

Abstract: A method includes receiving training data including transcribed speech utterances spoken in a general domain, modified speech utterances in a target domain, and unspoken textual utterances corresponding to the transcriptions of the modified speech utterances in the target domain. The modified speech utterances include utterances spoken in the target domain that have been modified to obfuscate one or more classes of sensitive information recited in the utterances. The method also includes generating a corresponding alignment output for each unspoken textual utterance of the received training data using an alignment model. The method also includes training a speech recognition model on the alignment outputs generated for the corresponding to the unspoken textual utterances, the un-transcribed speech utterances, and the transcribed speech utterances to teach the speech recognition model to learn to recognize speech in the target domain and phrases within the one or more classes of sensitive information.

4.

发明申请
Supervised and Unsupervised Training with Contrastive Loss Over Sequences 有权

公开(公告)号：US20220310065A1

公开(公告)日：2022-09-29

申请号：US17655903

申请日：2022-03-22

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran , Zhehuai Chen , Gary Wang , Yu Zhang , Jesse Emond

IPC: G10L15/06 , G10L15/16 , G10L15/22 , G10L13/02 , G06N3/08

Abstract: A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.

5.

发明公开
Using Non-Parallel Voice Conversion for Speech Conversion Models 审中-公开

公开(公告)号：US20230298565A1

公开(公告)日：2023-09-21

申请号：US17660487

申请日：2022-04-25

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Gary Wang , Bhuvana Ramabhadran , Fadi Biadsy

IPC: G10L15/06 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22

CPC classification number: G10L15/063 , G10L15/197 , G10L13/02 , G10L19/038 , G10L15/22 , G10L2015/0635 , G10L2019/0001

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation. The method also includes determining a consistent loss term for the corresponding training utterance pair based on the first and second probability distributions and updating parameters of the speech recognition model based on the consistent loss term.

6.

发明申请
Advancing the Use of Text and Speech in ASR Pretraining With Consistency and Contrastive Losses 有权

公开(公告)号：US20230013587A1

公开(公告)日：2023-01-19

申请号：US17722264

申请日：2022-04-15

Applicant: Google LLC

Inventor： Andrew Rosenberg , Zhehuai Chen , Bhuvana Ramabhadran , Pedro J. Moreno Mengibar , Gary Wang , Yu Zhang

IPC: G10L19/00 , G10L13/02 , G10L15/26

Abstract: A method includes receiving training data that includes unspoken text utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance is paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification