Patent search ap:("GOOGLE LLC") AND inv:"Yonghui Wu" Page 2

11.

发明授权
Large-scale multilingual speech recognition with a streaming end-to-end model 有权

公开(公告)号：US11468244B2

公开(公告)日：2022-10-11

申请号：US16834342

申请日：2020-03-30

Applicant: Google LLC

Inventor： Anjuli Patricia Kannan , Tara N. Sainath , Yonghui Wu , Ankur Bapna , Arindrima Datta

IPC: G10L15/00 , G06F40/40

Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

12.

发明申请
Unsupervised Parallel Tacotron Non-Autoregressive and Controllable Text-To-Speech 有权

公开(公告)号：US20220301543A1

公开(公告)日：2022-09-22

申请号：US17326542

申请日：2021-05-21

Applicant: Google LLC

Inventor： Isaac Elias , Byungha Chun , Jonathan Shen , Ye Jia , Yu Zhang , Yonghui Wu

IPC: G10L13/08 , G10L13/04

Abstract: A method for training a non-autoregressive TTS model includes obtaining a sequence representation of an encoded text sequence concatenated with a variational embedding. The method also includes using a duration model network to predict a phoneme duration for each phoneme represented by the encoded text sequence. Based on the predicted phoneme durations, the method also includes learning an interval representation and an auxiliary attention context representation. The method also includes upsampling, using the interval representation and the auxiliary attention context representation, the sequence representation into an upsampled output specifying a number of frames. The method also includes generating, based on the upsampled output, one or more predicted mel-frequency spectrogram sequences for the encoded text sequence. The method also includes determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence and training the TTS model based on the final spectrogram loss.

13.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20220005465A1

公开(公告)日：2022-01-06

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A.u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G10L15/22 , G10L15/02 , G06N3/08 , G10L15/06 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

14.

发明申请
NEURAL MACHINE TRANSLATION SYSTEMS 有权

公开(公告)号：US20210390271A1

公开(公告)日：2021-12-16

申请号：US17459111

申请日：2021-08-27

Applicant: Google LLC

Inventor： Mohammad Norouzi , Zhifeng Chen , Yonghui Wu , Michael Schuster , Quoc V. Le

IPC: G06F40/58 , G06F40/44 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. The method comprises obtaining a first sequence of words in a source language, generating a modified sequence of words in the source language by inserting a word boundary symbol only at the beginning of each word in the first sequence of words and not at the end of each word, dividing the modified sequence of words into wordpieces using a wordpiece model, generating, from the wordpieces, an input sequence of input tokens for a neural machine translation system; and generating an output sequence of words using the neural machine translation system based on the input sequence of input tokens.

15.

发明申请
SELECTIVE TEXT PREDICTION FOR ELECTRONIC MESSAGING 审中-公开

公开(公告)号：US20190197101A1

公开(公告)日：2019-06-27

申请号：US15852916

申请日：2017-12-22

Applicant: Google LLC

Inventor： Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay LU , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Gregory Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammour Al-Rfou'

IPC: G06F17/27 , G06F3/0484 , G06N99/00

CPC classification number: G06F17/276 , G06F3/0237 , G06F3/04842 , G06F17/273 , G06F17/274 , G06F17/277 , G06F17/2785 , G06F21/6245 , G06N20/00

Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.

16.

发明授权
End-to-end text-to-speech conversion 有权

公开(公告)号：US12190860B2

公开(公告)日：2025-01-07

申请号：US18516069

申请日：2023-11-21

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/06 , G06N3/045 , G06N3/08 , G06N3/084 , G10L13/04 , G10L13/08 , G10L15/16 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

17.

发明授权
Synthesizing speech from text using neural networks 有权

公开(公告)号：US12148444B2

公开(公告)日：2024-11-19

申请号：US17222736

申请日：2021-04-05

Applicant: Google LLC

Inventor： Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis

IPC: G10L13/047 , G06N3/045 , G06N3/08 , G06N5/046 , G06N7/01 , G10L13/08 , G10L25/18 , G10L25/30

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

18.

发明授权
Text-to-speech using duration prediction 有权

公开(公告)号：US12100382B2

公开(公告)日：2024-09-24

申请号：US17492543

申请日：2021-10-01

Applicant: Google LLC

Inventor： Yu Zhang , Isaac Elias , Byungha Chun , Ye Jia , Yonghui Wu , Mike Chrzanowski , Jonathan Shen

IPC: G10L13/027 , G10L13/04

CPC classification number: G10L13/027 , G10L13/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, synthesizing audio data from text data using duration prediction. One of the methods includes processing an input text sequence that includes a respective text element at each of multiple input time steps using a first neural network to generate a modified input sequence comprising, for each input time step, a representation of the corresponding text element in the input text sequence; processing the modified input sequence using a second neural network to generate, for each input time step, a predicted duration of the corresponding text element in the output audio sequence; upsampling the modified input sequence according to the predicted durations to generate an intermediate sequence comprising a respective intermediate element at each of a plurality of intermediate time steps; and generating an output audio sequence using the intermediate sequence.

19.

发明授权
Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice 有权

公开(公告)号：US12094453B2

公开(公告)日：2024-09-17

申请号：US17447285

申请日：2021-09-09

Applicant: Google LLC

Inventor： Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang

IPC: G10L15/06 , G10L15/16 , G10L15/187 , G10L15/22 , G10L15/30

CPC classification number: G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L15/187

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

20.

发明公开
Parallel Tacotron Non-Autoregressive and Controllable TTS 审中-公开

公开(公告)号：US20240161730A1

公开(公告)日：2024-05-16

申请号：US18421116

申请日：2024-01-24

Applicant: Google LLC

Inventor： Isaac Elias , Jonathan Shen , Yu Zhang , Ye Jia , Ron J. Weiss , Yonghui Wu , Byungha Chun

IPC: G10L13/08 , G06F40/126 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/047 , G10L21/10

CPC classification number: G10L13/08 , G06F40/126 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/047 , G10L21/10 , G06N3/048

Abstract: A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification