Patent search ap:("Google LLC") AND inv:"Takaaki Saeki" Page 1

1.

发明公开
MASSIVE MULTILINGUAL SPEECH-TEXT JOINT SEMI-SUPERVISED LEARNING FOR TEXT-TO-SPEECH 审中-公开

公开(公告)号：US20240153484A1

公开(公告)日：2024-05-09

申请号：US18494324

申请日：2023-10-25

Applicant: Google LLC

Inventor： Andrew M. Rosenberg , Takaaki Saeki , Zhehuai Chen , Byungha Chun , Bhuvana Ramabhadran

IPC: G10L13/047 , G10L15/06 , G10L15/16

CPC classification number: G10L13/047 , G10L15/063 , G10L15/16

Abstract: A method includes receiving training data that includes a plurality of sets of text-to-speech (TTS) spoken utterances each associated with a respective language and including TTS utterances of synthetic speech spoken that includes a corresponding reference speech representation paired with a corresponding input text sequence. For each TTS utterance in each set of the TTS spoken training utterances of the received training data, the method includes generating a corresponding TTS encoded textual representation for the corresponding input text sequence, generating a corresponding speech encoding for the corresponding TTS utterance of synthetic speech, generating a shared encoder output, generating a predicted speech representation for the corresponding TTS utterance of synthetic speech, and determining a reconstruction loss. The method also includes training a TTS model based on the reconstruction losses determined for the TTS utterances in each set of the TTS spoken training utterances.

2.

发明申请
Scaling Multilingual Speech Synthesis with Zero Supervision of Found Data 有权

公开(公告)号：US20250078805A1

公开(公告)日：2025-03-06

申请号：US18823661

申请日：2024-09-03

Applicant: Google LLC

Inventor： Andrew M Rosenberg , Takaaki Saeki , Francoise Beaufays , Bhuvana Ramabhadran

IPC: G10L13/02 , G10L25/30

Abstract: A method includes receiving training data that includes a plurality of sets of training utterances each associated with a respective language. Each training utterance includes a corresponding reference speech representation paired with a corresponding input text sequence. For each training utterance, the method includes generating a corresponding encoded textual representation for the corresponding input text sequence, generating a corresponding speech encoding for the corresponding reference speech representation, generating a shared encoder output, and determining a text-to-speech (TTS) loss based on the corresponding encoded textual representation, the corresponding speech encoding, and the shared encoder output. The method also includes training a TTS model based on the TTS losses determined for the training utterances in each set of the training utterances to teach the TTS model to learn how to synthesize speech in each of the respective languages.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification