Patent search ap:("Google LLC") AND inv:"Robert Clark" Page 1

1.

发明授权
Self-training WaveNet for text-to-speech 有权

公开(公告)号：US11295725B2

公开(公告)日：2022-04-05

申请号：US16925230

申请日：2020-07-09

Applicant: Google LLC

Inventor： Manish Sharma , Tom Marius Kenter , Robert Clark

IPC: G10L13/047 , G10L25/30

Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.

2.

发明申请
Self-Training WaveNet for Text-to-Speech 有权

公开(公告)号：US20220013105A1

公开(公告)日：2022-01-13

申请号：US16925230

申请日：2020-07-09

Applicant: Google LLC

Inventor： Manish Sharma , Tom Marius Kenter , Robert Clark

IPC: G10L13/047 , G10L25/30

Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.

3.

发明授权
Attention-based clockwork hierarchical variational encoder 有权

公开(公告)号：US12080272B2

公开(公告)日：2024-09-03

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

4.

发明授权
Key frame networks 有权

公开(公告)号：US12046227B2

公开(公告)日：2024-07-23

申请号：US17659840

申请日：2022-04-19

Applicant: Google LLC

Inventor： Tom Marius Kenter , Tobias Alexander Hawker , Robert Clark

IPC: G10L13/08 , G10L15/02 , G10L15/06 , G10L15/187

CPC classification number: G10L13/08 , G10L15/02 , G10L15/063 , G10L15/187 , G10L2015/025

Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.

5.

发明公开
Key Frame Networks 审中-公开

公开(公告)号：US20230335110A1

公开(公告)日：2023-10-19

申请号：US17659840

申请日：2022-04-19

Applicant: Google LLC

Inventor： Tom Marius Kenter , Tobias Alexander Hawker , Robert Clark

IPC: G10L13/08 , G10L15/02 , G10L15/06 , G10L15/187

CPC classification number: G10L13/08 , G10L15/02 , G10L15/063 , G10L15/187 , G10L2015/025

Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.

6.

发明申请
Attention-Based Clockwork Hierarchical Variational Encoder 有权

公开(公告)号：US20220415306A1

公开(公告)日：2022-12-29

申请号：US17756264

申请日：2019-12-10

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

Abstract: A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

7.

发明公开
Attention-Based Clockwork Hierarchical Variational Encoder 审中-公开

公开(公告)号：US20240038214A1

公开(公告)日：2024-02-01

申请号：US18487227

申请日：2023-10-16

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

CPC classification number: G10L13/10 , G10L25/30 , G10L2013/105

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

8.

发明授权
Attention-based clockwork hierarchical variational encoder 有权

公开(公告)号：US12272349B2

公开(公告)日：2025-04-08

申请号：US18487227

申请日：2023-10-16

Applicant: Google LLC

Inventor： Robert Clark , Chun-An Chan , Vincent Wan

IPC: G10L13/10 , G10L25/30

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

9.

发明申请
Clockwork Hierarchical Variational Encoder 有权

公开(公告)号：US20210134266A1

公开(公告)日：2021-05-06

申请号：US17147548

申请日：2021-01-13

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L13/047

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

10.

发明授权
Clockwork hierarchical variational encoder 有权

公开(公告)号：US10923107B2

公开(公告)日：2021-02-16

申请号：US16382722

申请日：2019-04-12

Applicant: Google LLC

Inventor： Robert Clark , Chun-an Chan , Vincent Wan

IPC: G10L13/10 , G10L13/047

Abstract: A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification