Patent search ap:("Google LLC") AND inv:"Yifan Ding" Page 1

1.

发明公开
SPEECH-TO-SPEECH TRANSLATION WITH MONOLINGUAL DATA 审中-公开

公开(公告)号：US20240289563A1

公开(公告)日：2024-08-29

申请号：US18589358

申请日：2024-02-27

Applicant: GOOGLE LLC

Inventor： Michelle Tadmor Ramanovich , Eliya Nachmani , Alon Levkovitch , Byungha Chun , Yifan Ding , Nadav Bar , Chulayuth Asawaroengchai

IPC: G06F40/58 , G10L15/00 , G10L15/06 , G10L25/18

CPC classification number: G06F40/58 , G10L15/005 , G10L15/063 , G10L25/18 , G10L2015/0635

Abstract: Training and/or utilizing a Speech-To-Speech Translation (S2ST) system that can be used to generate, based on processing source audio data that captures a spoken utterance in a source language, target audio data that includes a synthetic spoken utterance that is spoken in a target language and that corresponds, both linguistically and para-linguistically, to the spoken utterance in the source language. Implementations that are directed to training the S2ST system utilize an unsupervised approach, with monolingual speech data, in training the S2ST system.

2.

发明公开
RESIDUAL ADAPTERS FOR FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION 审中-公开

公开(公告)号：US20240135915A1

公开(公告)日：2024-04-25

申请号：US18493770

申请日：2023-10-23

Applicant: Google LLC

Inventor： Nobuyuki Morioka , Byungha Chun , Nanxin Chen , Yu Zhang , Yifan Ding

IPC: G10L13/027

CPC classification number: G10L13/027

Abstract: A method for residual adapters for few-shot text-to-speech speaker adaptation includes obtaining a text-to-speech (TTS) model configured to convert text into representations of synthetic speech, the TTS model pre-trained on an initial training data set. The method further includes augmenting the TTS model with a stack of residual adapters. The method includes receiving an adaption training data set including one or more spoken utterances spoken by a target speaker, each spoken utterance in the adaptation training data set paired with corresponding input text associated with a transcription of the spoken utterance. The method also includes adapting, using the adaption training data set, the TTS model augmented with the stack of residual adapters to learn how to synthesize speech in a voice of the target speaker by optimizing the stack of residual adapters while parameters of the TTS model are frozen.

3.

发明公开
RESIDUAL ADAPTERS FOR FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION 审中-公开

公开(公告)号：US20240233704A9

公开(公告)日：2024-07-11

申请号：US18493770

申请日：2023-10-24

Applicant: Google LLC

Inventor： Nobuyuki Morioka , Byungha Chun , Nanxin Chen , Yu Zhang , Yifan Ding

IPC: G10L13/027

CPC classification number: G10L13/027

Abstract: A method for residual adapters for few-shot text-to-speech speaker adaptation includes obtaining a text-to-speech (TTS) model configured to convert text into representations of synthetic speech, the TTS model pre-trained on an initial training data set. The method further includes augmenting the TTS model with a stack of residual adapters. The method includes receiving an adaption training data set including one or more spoken utterances spoken by a target speaker, each spoken utterance in the adaptation training data set paired with corresponding input text associated with a transcription of the spoken utterance. The method also includes adapting, using the adaption training data set, the TTS model augmented with the stack of residual adapters to learn how to synthesize speech in a voice of the target speaker by optimizing the stack of residual adapters while parameters of the TTS model are frozen.

Patent Agency Ranking