Invention Grant
- Patent Title: Speech synthesis prosody using a BERT model
-
Application No.: US16867427Application Date: 2020-05-05
-
Publication No.: US11881210B2Publication Date: 2024-01-23
- Inventor: Tom Marius Kenter , Manish Kumar Sharma , Robert Andrew James Clark , Aliaksei Severyn
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant Griffith
- Main IPC: G10L15/16
- IPC: G10L15/16 ; G06N3/084 ; G10L15/02 ; G10L15/06

Abstract:
A method for generating a prosodic representation includes receiving a text utterance having one or more words. Each word has at least one syllable having at least one phoneme. The method also includes generating, using a Bidirectional Encoder Representations from Transformers (BERT) model, a sequence of wordpiece embeddings and selecting an utterance embedding for the text utterance, the utterance embedding representing an intended prosody. Each wordpiece embedding is associated with one of the one or more words of the text utterance. For each syllable, using the selected utterance embedding and a prosody model that incorporates the BERT model, the method also includes generating a corresponding prosodic syllable embedding for the syllable based on the wordpiece embedding associated with the word that includes the syllable and predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with the corresponding prosodic syllable embedding for the syllable.
Public/Granted literature
- US20210350795A1 Speech Synthesis Prosody Using A BERT Model Public/Granted day:2021-11-11
Information query