Patent search ap:("Google LLC") AND inv:"Yonghui Wu" Page 7

61.

发明申请
TRAINING OF LARGE NEURAL NETWORKS 有权

公开(公告)号：US20240378441A1

公开(公告)日：2024-11-14

申请号：US18661447

申请日：2024-05-10

Applicant: Google LLC

Inventor： Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.

62.

发明授权
Speech recognition with sequence-to-sequence models 有权

公开(公告)号：US12106749B2

公开(公告)日：2024-10-01

申请号：US17448119

申请日：2021-09-20

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. u. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/00 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L25/30 , G10L15/26

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G10L25/30 , G10L2015/025 , G10L15/26

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

63.

发明公开
Contrastive Learning and Masked Modeling for End-To-End Self-Supervised Pre-Training 审中-公开

公开(公告)号：US20240104352A1

公开(公告)日：2024-03-28

申请号：US18012391

申请日：2022-07-28

Applicant: Google LLC

Inventor： Yu Zhang , Yu-An Chung , Wei Han , Chung-Cheng Chiu , Weikeng Qin , Ruoming Pang , Yonghui Wu

IPC: G06N3/0455

CPC classification number: G06N3/0455

Abstract: Provided are improved end-to-end self-supervised pre-training frameworks that leverage a combination of contrastive and masked modeling loss terms. In particular, the present disclosure provides framework that combines contrastive learning and masked modeling, where the former trains the model to discretize input data (e.g., continuous signals such as continuous speech signals) into a finite set of discriminative tokens, and the latter trains the model to learn contextualized representations via solving a masked prediction task consuming the discretized tokens. In contrast to certain existing masked modeling-based pre-training frameworks which rely on an iterative re-clustering and re-training process or other existing frameworks which concatenate two separately trained modules, the proposed framework can enable a model to be optimized in an end-to-end fashion by solving the two self-supervised tasks (the contrastive task and masked modeling) simultaneously.

64.

发明授权
Synthesis of speech from text in a voice of a target speaker using neural networks 有权

公开(公告)号：US11848002B2

公开(公告)日：2023-12-19

申请号：US17813361

申请日：2022-07-19

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L13/02

CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L2013/021

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

65.

发明公开
SELECTIVE TEXT PREDICTION FOR ELECTRONIC MESSAGING 审中-公开

公开(公告)号：US20230385543A1

公开(公告)日：2023-11-30

申请号：US18447186

申请日：2023-08-09

Applicant: Google LLC

Inventor： Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay LU , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Greg Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammour Al-Rfou'

IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284

CPC classification number: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/6245 , G06F40/284 , G06F40/30 , G06F40/232 , G06F40/253 , G06F3/0237

Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.

66.

发明公开
CONTRASTIVE CAPTIONING NEURAL NETWORKS 审中-公开

公开(公告)号：US20230351149A1

公开(公告)日：2023-11-02

申请号：US18141340

申请日：2023-04-28

Applicant: Google LLC

Inventor： Jiahui Yu , Zirui Wang , Vijay Vasudevan , Ho Man Yeung , Seyed Mojtaba Seyedhosseini Tarzjani , Yonghui Wu

IPC: G06N3/04

CPC classification number: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing multi-modal inputs using contrastive captioning neural networks.

67.

发明授权
Selective text prediction for electronic messaging 有权

公开(公告)号：US11755834B2

公开(公告)日：2023-09-12

申请号：US15852916

申请日：2017-12-22

Applicant: Google LLC

Inventor： Paul Roland Lambert , Timothy Youngjin Sohn , Jacqueline Amy Tsay , Gagan Bansal , Cole Austin Bevis , Kaushik Roy , Justin Tzi-jay Lu , Katherine Anna Evans , Tobias Bosch , Yinan Wang , Matthew Vincent Dierker , Gregory Russell Bullock , Ettore Randazzo , Tobias Kaufmann , Yonghui Wu , Benjamin N. Lee , Xu Chen , Brian Strope , Yun-hsuan Sung , Do Kook Choe , Rami Eid Sammouf Al-Rfou'

IPC: G06F40/274 , G06F3/04842 , G06N20/00 , G06F21/62 , G06F3/023 , G06F40/30 , G06F40/232 , G06F40/253 , G06F40/284

CPC classification number: G06F40/274 , G06F3/0237 , G06F3/04842 , G06F21/6245 , G06F40/232 , G06F40/253 , G06F40/284 , G06F40/30 , G06N20/00

Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.

68.

发明公开
END-TO-END SPEECH WAVEFORM GENERATION THROUGH DATA DENSITY GRADIENT ESTIMATION 审中-公开

公开(公告)号：US20230252974A1

公开(公告)日：2023-08-10

申请号：US18010438

申请日：2021-09-02

Applicant: Google LLC

Inventor： Byungha Chun , Mohammad Norouzi , Nanxin Chen , Ron J. Weiss , William Chan , Yu Zhang , Yonghui Wu

IPC: G10L13/08 , G10L21/0208

CPC classification number: G10L13/08 , G10L21/0208

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.

69.

发明申请
ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS 有权

公开(公告)号：US20230118303A1

公开(公告)日：2023-04-20

申请号：US18082415

申请日：2022-12-15

Applicant: Google LLC

Inventor： Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu

IPC: G06F9/48 , G06N3/063 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.

70.

发明申请
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 有权

公开(公告)号：US20220130374A1

公开(公告)日：2022-04-28

申请号：US17572238

申请日：2022-01-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/16 , G10L15/07

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification