Parallel tacotron non-autoregressive and controllable TTS

Invention Grant

US11908448B2 Parallel tacotron non-autoregressive and controllable TTS 有权

Please log in to see more content

Patent Title: Parallel tacotron non-autoregressive and controllable TTS
Application No.: US17327076

Application Date: 2021-05-21
Publication No.: US11908448B2

Publication Date: 2024-02-20
Inventor: Isaac Elias , Jonathan Shen , Yu Zhang , Ye Jia , Ron J. Weiss , Yonghui Wu , Byungha Chun
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant Griffith
Main IPC: G10L13/08
IPC: G10L13/08 ; G10L13/047 ; G06F40/126 ; G10L21/10 ; G06N3/08 ; G06N3/088 ; G06N3/044 ; G06N3/045 ; G06N3/048

Parallel tacotron non-autoregressive and controllable TTS

Abstract:

A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.

Public/Granted literature

US20220122582A1 Parallel Tacotron Non-Autoregressive and Controllable TTS Public/Granted day:2022-04-21

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定