Patent search ap:("Google LLC") AND inv:"Nanxin Chen" Page 1

1.

发明公开
RESIDUAL ADAPTERS FOR FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION 审中-公开

公开(公告)号：US20240233704A9

公开(公告)日：2024-07-11

申请号：US18493770

申请日：2023-10-24

Applicant: Google LLC

Inventor： Nobuyuki Morioka , Byungha Chun , Nanxin Chen , Yu Zhang , Yifan Ding

IPC: G10L13/027

CPC classification number: G10L13/027

Abstract: A method for residual adapters for few-shot text-to-speech speaker adaptation includes obtaining a text-to-speech (TTS) model configured to convert text into representations of synthetic speech, the TTS model pre-trained on an initial training data set. The method further includes augmenting the TTS model with a stack of residual adapters. The method includes receiving an adaption training data set including one or more spoken utterances spoken by a target speaker, each spoken utterance in the adaptation training data set paired with corresponding input text associated with a transcription of the spoken utterance. The method also includes adapting, using the adaption training data set, the TTS model augmented with the stack of residual adapters to learn how to synthesize speech in a voice of the target speaker by optimizing the stack of residual adapters while parameters of the TTS model are frozen.

2.

发明公开
PARAMETER-EFFICIENT MODEL REPROGRAMMING FOR CROSS-LINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240185841A1

公开(公告)日：2024-06-06

申请号：US18490808

申请日：2023-10-20

Applicant: Google LLC

Inventor： Bo Li , Yu Zhang , Nanxin Chen , Rohit Prakash Prabhavalkar , Chao-Han Huck Yang , Tara N. Sainath , Trevor Strohman

IPC: G10L15/065 , G10L15/00

CPC classification number: G10L15/065 , G10L15/005

Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.

3.

发明公开
CONDITIONAL OUTPUT GENERATION THROUGH DATA DENSITY GRADIENT ESTIMATION 审中-公开

公开(公告)号：US20230325658A1

公开(公告)日：2023-10-12

申请号：US18010426

申请日：2021-09-02

Applicant: Google LLC

Inventor： Nanxin Chen , Byungha Chun , William Chan , Ron J. Weiss , Mohammad Norouzi , Yu Zhang , Yonghui Wu

IPC: G06V10/82 , G06N3/08 , G10L13/02 , G10L25/18 , G10L25/30 , G06V10/764 , G06V10/26

CPC classification number: G06N3/08 , G06V10/26 , G06V10/764 , G06V10/82 , G10L13/02 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating outputs conditioned on network inputs using neural networks. In one aspect, a method comprises obtaining the network input; initializing a current network output; and generating the final network output by updating the current network output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing a model input for the iteration comprising (i) the current network output and (ii) the network input using a noise estimation neural network that is configured to process the model input to generate a noise output, wherein the noise output comprises a respective noise estimate for each value in the current network output; and updating the current network output using the noise estimate and the noise level for the iteration.

4.

发明公开
RESIDUAL ADAPTERS FOR FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION 审中-公开

公开(公告)号：US20240135915A1

公开(公告)日：2024-04-25

申请号：US18493770

申请日：2023-10-23

Applicant: Google LLC

Inventor： Nobuyuki Morioka , Byungha Chun , Nanxin Chen , Yu Zhang , Yifan Ding

IPC: G10L13/027

CPC classification number: G10L13/027

Abstract: A method for residual adapters for few-shot text-to-speech speaker adaptation includes obtaining a text-to-speech (TTS) model configured to convert text into representations of synthetic speech, the TTS model pre-trained on an initial training data set. The method further includes augmenting the TTS model with a stack of residual adapters. The method includes receiving an adaption training data set including one or more spoken utterances spoken by a target speaker, each spoken utterance in the adaptation training data set paired with corresponding input text associated with a transcription of the spoken utterance. The method also includes adapting, using the adaption training data set, the TTS model augmented with the stack of residual adapters to learn how to synthesize speech in a voice of the target speaker by optimizing the stack of residual adapters while parameters of the TTS model are frozen.

5.

发明公开
END-TO-END SPEECH WAVEFORM GENERATION THROUGH DATA DENSITY GRADIENT ESTIMATION 审中-公开

公开(公告)号：US20230252974A1

公开(公告)日：2023-08-10

申请号：US18010438

申请日：2021-09-02

Applicant: Google LLC

Inventor： Byungha Chun , Mohammad Norouzi , Nanxin Chen , Ron J. Weiss , William Chan , Yu Zhang , Yonghui Wu

IPC: G10L13/08 , G10L21/0208

CPC classification number: G10L13/08 , G10L21/0208

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification