Invention Grant
- Patent Title: Parallel tacotron non-autoregressive and controllable TTS
-
Application No.: US17327076Application Date: 2021-05-21
-
Publication No.: US11908448B2Publication Date: 2024-02-20
- Inventor: Isaac Elias , Jonathan Shen , Yu Zhang , Ye Jia , Ron J. Weiss , Yonghui Wu , Byungha Chun
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant Griffith
- Main IPC: G10L13/08
- IPC: G10L13/08 ; G10L13/047 ; G06F40/126 ; G10L21/10 ; G06N3/08 ; G06N3/088 ; G06N3/044 ; G06N3/045 ; G06N3/048

Abstract:
A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.
Public/Granted literature
- US20220122582A1 Parallel Tacotron Non-Autoregressive and Controllable TTS Public/Granted day:2022-04-21
Information query