专利检索 ap:("Google LLC") AND inv:"Andrew Rosenberg" 第 1 页

1.

发明公开
Using Aligned Text and Speech Representations to Train Automatic Speech Recognition Models without Transcribed Speech Data 审中-公开

公开(公告)号：US20240029715A1

公开(公告)日：2024-01-25

申请号：US18355508

申请日：2023-07-20

申请人： Google LLC

发明人： Andrew Rosenberg , Zhehuai Chen , Ankur Bapna , Yu Zhang , Bhuvana Ramabhadran

IPC分类号： G10L15/06

CPC分类号： G10L15/063

摘要： A method includes receiving training data that includes unspoken textual utterances in a target language. Each unspoken textual utterance not paired with any corresponding spoken utterance of non-synthetic speech. The method also includes generating a corresponding alignment output for each unspoken textual utterance using an alignment model trained on transcribed speech utterance in one or more training languages each different than the target language. The method also includes generating a corresponding encoded textual representation for each alignment output using a text encoder and training a speech recognition model on the encoded textual representations generated for the alignment outputs. Training the speech recognition model teaches the speech recognition model to learn how to recognize speech in the target language.

2.

发明授权
Instantaneous learning in text-to-speech during dialog 有权

公开(公告)号：US11676572B2

公开(公告)日：2023-06-13

申请号：US17190456

申请日：2021-03-03

申请人： Google LLC

发明人： Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski

IPC分类号： G10L17/02 , G10L13/08 , G10L15/187

CPC分类号： G10L13/08 , G10L15/187

摘要： A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.

3.

发明授权
Generating diverse and natural text-to-speech samples 有权

公开(公告)号：US11475874B2

公开(公告)日：2022-10-18

申请号：US17163007

申请日：2021-01-29

申请人： Google LLC

发明人： Yu Zhang , Bhuvana Ramabhadran , Andrew Rosenberg , Yonghui Wu , Byungha Chun , Ron Weiss , Yuan Cao

IPC分类号： G10L25/30 , G10L25/00 , G10L17/00 , G10L13/047 , G10L25/18 , G06N3/08 , G10L15/06 , G10L13/10

摘要： A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

4.

发明授权
Consistency prediction on streaming sequence models 有权

公开(公告)号：US11929060B2

公开(公告)日：2024-03-12

申请号：US17170836

申请日：2021-02-08

申请人： Google LLC

发明人： Zhehuai Chen , Andrew Rosenberg , Bhuvana Ramabhadran , Pedro Jose Moreno Mengibar

IPC分类号： G10L15/06 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197

CPC分类号： G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197 , G10L2015/0635

摘要： A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model. The method also includes updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair.

5.

发明授权
Synthesized data augmentation using voice conversion and speech recognition models 有权

公开(公告)号：US11335324B2

公开(公告)日：2022-05-17

申请号：US17008278

申请日：2020-08-31

申请人： Google LLC

发明人： Fadi Biadsy , Liyang Jiang , Pedro J. Moreno Mengibar , Andrew Rosenberg

IPC分类号： G10L15/06 , G10L15/16 , G10L13/04 , G10L13/047 , G10L15/22 , G10L13/08

摘要： A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

6.

发明申请
Improving Speech Recognition with Speech Synthesis-based Model Adapation 有权

公开(公告)号：US20230058447A1

公开(公告)日：2023-02-23

申请号：US17445537

申请日：2021-08-20

申请人： Google LLC

发明人： Andrew Rosenberg , Bhuvana Ramabhadran

IPC分类号： G10L21/007 , G10L15/26 , G10L25/30 , G06N3/08

摘要： A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

7.

发明申请
Training Speech Synthesis to Generate Distinct Speech Sounds 有权

公开(公告)号：US20230009613A1

公开(公告)日：2023-01-12

申请号：US17756995

申请日：2019-12-13

申请人： Google LLC

发明人： Andrew Rosenberg , Bhuvana Ramabhadran , Fadi Biadsy , Yu Zhang

IPC分类号： G10L13/047 , G10L13/08 , G10L15/16 , G10L15/06

摘要： A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

8.

发明申请
Generating Diverse and Natural Text-To-Speech Samples 有权

公开(公告)号：US20220246132A1

公开(公告)日：2022-08-04

申请号：US17163007

申请日：2021-01-29

申请人： Google LLC

发明人： Yu Zhang , Bhuvana Ramabhadran , Andrew Rosenberg , Yonghui Wu , Byungha Chun , Ron Weiss , Yuan Cao

IPC分类号： G10L13/047 , G10L25/18 , G10L13/10 , G10L15/06 , G06N3/08

摘要： A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

9.

发明公开
USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS 审中-公开

公开(公告)号：US20240282292A1

公开(公告)日：2024-08-22

申请号：US18654278

申请日：2024-05-03

申请人： Google LLC

发明人： Zhehuai Chen , Bhuvana Ramabhadran , Andrew Rosenberg , Yu Zhang , Pedro J. Moreno Mengibar

IPC分类号： G10L13/047 , G10L13/08 , G10L13/10

CPC分类号： G10L13/047 , G10L13/086 , G10L13/10

摘要： A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

10.

发明公开
INSTANTANEOUS LEARNING IN TEXT-TO-SPEECH DURING DIALOG 审中-公开

公开(公告)号：US20230274727A1

公开(公告)日：2023-08-31

申请号：US18312576

申请日：2023-05-04

申请人： Google LLC

发明人： Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski

IPC分类号： G10L13/08 , G10L15/187

CPC分类号： G10L13/08 , G10L15/187

摘要： A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类