Patent search ap:("Google LLC") AND inv:"Vincent Wan" Page 2

11.

发明授权
Two-level text-to-speech systems using synthetic training data 有权

公开(公告)号：US12260851B2

公开(公告)日：2025-03-25

申请号：US17305809

申请日：2021-07-14

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-an Chan , Byungha Chun , Norman Casagrande , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/00 , G10L13/047 , G10L13/08

Abstract: A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

12.

发明申请
Two-Level Speech Prosody Transfer 有权

公开(公告)号：US20230064749A1

公开(公告)日：2023-03-02

申请号：US18054604

申请日：2022-11-11

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-an Chan , Byungha Chun , Ye Jia , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/10 , G10L13/02 , G10L17/18

Abstract: A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation for the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

13.

发明授权
Two-level speech prosody transfer 有权

公开(公告)号：US11514888B2

公开(公告)日：2022-11-29

申请号：US16992410

申请日：2020-08-13

Applicant: Google LLC

Inventor： Lev Finkelstein , Chun-An Chan , Byungha Chun , Ye Jia , Yu Zhang , Robert Andrew James Clark , Vincent Wan

IPC: G10L13/10 , G10L13/02 , G10L17/18

Abstract: A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation tor the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

14.

发明授权
Predicting parametric vocoder parameters from prosodic features 有权

公开(公告)号：US11232780B1

公开(公告)日：2022-01-25

申请号：US17033783

申请日：2020-09-26

Applicant: Google LLC

Inventor： Rakesh Iyer , Vincent Wan

IPC: G10L13/027 , G10L13/10

Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

15.

发明申请
Clockwork Hierarchical Variational Encoder 有权

公开(公告)号：US20210134266A1

公开(公告)日：2021-05-06

申请号：US17147548

申请日：2021-01-13

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L13/047

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

16.

发明授权
Clockwork hierarchical variational encoder 有权

公开(公告)号：US10923107B2

公开(公告)日：2021-02-16

申请号：US16382722

申请日：2019-04-12

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L13/047

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification