Patent search ap:("Google LLC") AND inv:"Liyang Jiang" Page 1

1.

发明公开
Streaming Vocoder 审中-公开

公开(公告)号：US20230267949A1

公开(公告)日：2023-08-24

申请号：US18163848

申请日：2023-02-02

Applicant: Google LLC

Inventor： Oleg Rybakov , Liyang Jiang , Fadi Biadsy

IPC: G10L21/10 , G10L21/18

CPC classification number: G10L21/10 , G10L21/18

Abstract: A method includes receiving a current spectrogram frame and reconstructing a phase of the current spectrogram frame by, for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame and estimating the phase of the current spectrogram frame based on a magnitude of the current spectrogram frame and the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame. The method also includes synthesizing, for the current spectrogram frame, a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame.

2.

发明授权
Synthesized data augmentation using voice conversion and speech recognition models 有权

公开(公告)号：US11335324B2

公开(公告)日：2022-05-17

申请号：US17008278

申请日：2020-08-31

Applicant: Google LLC

Inventor： Fadi Biadsy , Liyang Jiang , Pedro J. Moreno Mengibar , Andrew Rosenberg

IPC: G10L15/06 , G10L15/16 , G10L13/04 , G10L13/047 , G10L15/22 , G10L13/08

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

3.

发明申请
Synthesized Data Augmentation Using Voice Conversion and Speech Recognition Models 有权

公开(公告)号：US20220068257A1

公开(公告)日：2022-03-03

申请号：US17008278

申请日：2020-08-31

Applicant: Google LLC

Inventor： Fadi Biadsy , Liyang Jiang , Pedro J. Moreno Mengibar , Andrew Rosenberg

IPC: G10L13/047 , G10L13/08 , G10L15/16 , G10L15/22

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Patent Agency Ranking