Invention Application
- Patent Title: Unsupervised Parallel Tacotron Non-Autoregressive and Controllable Text-To-Speech
-
Application No.: US17326542Application Date: 2021-05-21
-
Publication No.: US20220301543A1Publication Date: 2022-09-22
- Inventor: Isaac Elias , Byungha Chun , Jonathan Shen , Ye Jia , Yu Zhang , Yonghui Wu
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Main IPC: G10L13/08
- IPC: G10L13/08 ; G10L13/04

Abstract:
A method for training a non-autoregressive TTS model includes obtaining a sequence representation of an encoded text sequence concatenated with a variational embedding. The method also includes using a duration model network to predict a phoneme duration for each phoneme represented by the encoded text sequence. Based on the predicted phoneme durations, the method also includes learning an interval representation and an auxiliary attention context representation. The method also includes upsampling, using the interval representation and the auxiliary attention context representation, the sequence representation into an upsampled output specifying a number of frames. The method also includes generating, based on the upsampled output, one or more predicted mel-frequency spectrogram sequences for the encoded text sequence. The method also includes determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence and training the TTS model based on the final spectrogram loss.
Public/Granted literature
- US11823656B2 Unsupervised parallel tacotron non-autoregressive and controllable text-to-speech Public/Granted day:2023-11-21
Information query