Patent search ap:("Google LLC") AND inv:"Vincent Wan" Page 1

1.

发明授权
Predicting parametric vocoder parameters from prosodic features 有权

公开(公告)号：US12125469B2

公开(公告)日：2024-10-22

申请号：US18488735

申请日：2023-10-17

Applicant: Google LLC

Inventor： Rakesh Iyer , Vincent Wan

IPC: G10L13/10 , G10L13/027

CPC classification number: G10L13/027 , G10L13/10

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

2.

发明申请
Two-Level Text-To-Speech Systems Using Synthetic Training Data 有权

公开(公告)号：US20230018384A1

公开(公告)日：2023-01-19

申请号：US17305809

申请日：2021-07-14

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-an Chan , Byungha Chun , Norman Casagrande , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/08 , G10L13/047

Abstract: A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

3.

发明授权
Attention-based clockwork hierarchical variational encoder 有权

公开(公告)号：US12080272B2

公开(公告)日：2024-09-03

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

4.

发明申请
Attention-Based Clockwork Hierarchical Variational Encoder 有权

公开(公告)号：US20220415306A1

公开(公告)日：2022-12-29

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

5.

发明申请
Two-Level Text-To-Speech Systems Using Synthetic Training Data 有权

公开(公告)号：US20250078808A1

公开(公告)日：2025-03-06

申请号：US18949095

申请日：2024-11-15

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-an Chan , Byungha Chun , Norman Casagrande , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/08 , G10L13/047

Abstract: A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

6.

发明公开
Predicting Parametric Vocoder Parameters From Prosodic Features 审中-公开

公开(公告)号：US20240046915A1

公开(公告)日：2024-02-08

申请号：US18488735

申请日：2023-10-17

Applicant: Google LLC

Inventor： Rakesh Iyer , Vincent Wan

IPC: G10L13/027 , G10L13/10

CPC classification number: G10L13/027 , G10L13/10

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

7.

发明公开
Attention-Based Clockwork Hierarchical Variational Encoder 审中-公开

公开(公告)号：US20240038214A1

公开(公告)日：2024-02-01

申请号：US18487227

申请日：2023-10-16

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

8.

发明授权
Predicting parametric vocoder parameters from prosodic features 有权

公开(公告)号：US11830474B2

公开(公告)日：2023-11-28

申请号：US17647246

申请日：2022-01-06

Applicant: Google LLC

Inventor： Rakesh Iyer , Vincent Wan

IPC: G10L13/10 , G10L13/027

CPC classification number: G10L13/027 , G10L13/10

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

9.

发明申请
Predicting Parametric Vocoder Parameters From Prosodic Features 有权

公开(公告)号：US20220130371A1

公开(公告)日：2022-04-28

申请号：US17647246

申请日：2022-01-06

Applicant: Google LLC

Inventor： Rakesh Iyer , Vincent Wan

IPC: G10L13/027 , G10L13/10

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

10.

发明授权
Attention-based clockwork hierarchical variational encoder 有权

公开(公告)号：US12272349B2

公开(公告)日：2025-04-08

申请号：US18487227

申请日：2023-10-16

Applicant: Google LLC

Inventor： Robert Clark , Chun-An Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification