On-device personalization of speech synthesis for training of speech model(s)

Invention Grant

US11545133B2 On-device personalization of speech synthesis for training of speech model(s) 有权

Please log in to see more content

Patent Title: On-device personalization of speech synthesis for training of speech model(s)
Application No.: US17082518

Application Date: 2020-10-28
Publication No.: US11545133B2

Publication Date: 2023-01-03
Inventor: Françoise Beaufays , Johan Schalkwyk , Khe Chai Sim
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Middleton Reutlinger
Main IPC: G10L15/18
IPC: G10L15/18 ; G10L15/07 ; G10L13/047 ; G10L13/033 ; G10L13/10

On-device personalization of speech synthesis for training of speech model(s)

Abstract:

Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using an on-device TTS generator model, to generate synthesized speech audio data that includes synthesized speech of the textual segment; process the synthesized speech, using an on-device ASR model to generate predicted ASR output; and generate a gradient based on comparing the predicted ASR output to ground truth output corresponding to the textual segment. Processor(s) of the client device can also: process the synthesized speech audio data using an on-device TTS generator model to make a prediction; and generate a gradient based on the prediction. In these implementations, the generated gradient(s) can be used to update weight(s) of the respective on-device model(s) and/or transmitted to a remote system for use in remote updating of respective global model(s). The updated weight(s) and/or the updated model(s) can be transmitted to client device(s).

Public/Granted literature

US20220115000A1 ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S) Public/Granted day:2022-04-14

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型