Synthesized data augmentation using voice conversion and speech recognition models

Invention Grant

US11335324B2 Synthesized data augmentation using voice conversion and speech recognition models 有权

Please log in to see more content

Patent Title: Synthesized data augmentation using voice conversion and speech recognition models
Application No.: US17008278

Application Date: 2020-08-31
Publication No.: US11335324B2

Publication Date: 2022-05-17
Inventor: Fadi Biadsy , Liyang Jiang , Pedro J. Moreno Mengibar , Andrew Rosenberg
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger
Main IPC: G10L15/06
IPC: G10L15/06 ; G10L15/16 ; G10L13/04 ; G10L13/047 ; G10L15/22 ; G10L13/08

Synthesized data augmentation using voice conversion and speech recognition models

Abstract:

A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Public/Granted literature

US20220068257A1 Synthesized Data Augmentation Using Voice Conversion and Speech Recognition Models Public/Granted day:2022-03-03

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）