Patent search ap:("Google LLC") AND inv:"Zhifeng Chen" Page 2

11.

发明申请
TRAINING GIANT NEURAL NETWORKS USING PIPELINE PARALLELISM 有权

公开(公告)号：US20210042620A1

公开(公告)日：2021-02-11

申请号：US16989787

申请日：2020-08-10

Applicant: Google LLC

Inventor： Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.

12.

发明申请
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 审中-公开

公开(公告)号：US20200380952A1

公开(公告)日：2020-12-03

申请号：US16855042

申请日：2020-04-22

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

13.

发明授权
End-to-end text-to-speech conversion 有权

公开(公告)号：US10573293B2

公开(公告)日：2020-02-25

申请号：US16447862

申请日：2019-06-20

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/027 , G10L15/16 , G10L13/08 , G06N3/08 , G10L25/18 , G10L25/30 , G10L13/04 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

14.

发明申请
REWARD AUGMENTED MODEL TRAINING 审中-公开

公开(公告)号：US20190188566A1

公开(公告)日：2019-06-20

申请号：US16328207

申请日：2017-08-25

Applicant: GOOGLE LLC

Inventor： Michael Schuster , Samuel Bengio , Navdeep Jaitly , Zhifeng Chen , Dale Eric Schuurmans , Mohammad Norouzi , Yonghui Wu

IPC: G06N3/08 , G06N20/00

CPC classification number: G06N3/08 , G06N20/00

Abstract: A method includes obtaining data identifying a machine learning model to be trained to perform a machine learning task, the machine learning model being configured to receive an input example and to process the input example in accordance with current values of a plurality of model parameters to generate a model output for the input example; obtaining initial training data for training the machine learning model, the initial training data comprising a plurality of training examples and, for each training example, a ground truth output that should be generated by the machine learning model by processing the training example; generating modified training data from the initial training data; and training the machine learning model on the modified training data.

15.

发明授权
Multi-dialect and multilingual speech recognition 有权

公开(公告)号：US12254865B2

公开(公告)日：2025-03-18

申请号：US18418246

申请日：2024-01-20

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/06 , G10L15/07 , G10L15/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

16.

发明申请
LARGE LANGUAGE MODEL (LLM) QUANTIZATION 有权

公开(公告)号：US20240428006A1

公开(公告)日：2024-12-26

申请号：US18211967

申请日：2023-06-20

Applicant: GOOGLE LLC

Inventor： Jian Li , Zhifeng Chen , Yanping Huang , Yuanzhong Xu , Tao Wang , YaGuang Li

IPC: G06F40/40

Abstract: Implementations relate to asymmetric quantization of large language models (LLMs). Processor(s) of a system can: obtain a trained LLM, wherein the trained LLM includes a plurality of layers, each layer comprising a respective plurality of weights; for each layer of the plurality of layers: calculate an optimal clipping range for the respective plurality of weights, and clip one or more weights of the respective plurality of weights that lie outside of the optimal clipping range to produce a clipped layer; quantize the LLM to generate a quantized LLM, wherein the instructions to quantize include instructions to map weights of the plurality of clipped layers of the LLM from continuous values to discrete values; and provide the quantized LLM for downstream processing.

17.

发明公开
Implementing and Training Computational Efficient Neural Network Architectures Utilizing Layer-Skip Logic 审中-公开

公开(公告)号：US20240303464A1

公开(公告)日：2024-09-12

申请号：US18598876

申请日：2024-03-07

Applicant: Google LLC

Inventor： Nan Du , Tao Wang , Yanqi Zhou , Tao Lei , Yuanzhong Xu , Andrew Mingbo Dai , Zhifeng Chen , Dewen Zeng , Yingwei Cui

IPC: G06N3/04 , G06N3/084

CPC classification number: G06N3/04 , G06N3/084

Abstract: A method includes providing a first set of data objects to a first skip router of a neural network (NN). The NN includes a first NN layer and a second NN layer. The first set of data objects is subdivided into a first set of skip objects and a first set of non-skip objects based on a first skip logic implemented by the first skip router and a first context of each data object in the first set of data objects. A first set of processed objects is generated based on the first set of non-skip objects and a first layer logic implemented by the first NN layer. Predictions are generated based on a second set of data objects and a second layer logic implemented by the second NN layer. The second set of data objects includes the first set of processed objects and the first set of skip objects.

18.

发明授权
Direct speech-to-speech translation via machine learning 有权

公开(公告)号：US12032920B2

公开(公告)日：2024-07-09

申请号：US17056554

申请日：2020-03-07

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Melvin Johnson , Fadi Biadsy , Ron Weiss , Wolfgang Macherey

IPC: G06F40/47 , G06F40/58

CPC classification number: G06F40/47 , G06F40/58

Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

19.

发明公开
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240161732A1

公开(公告)日：2024-05-16

申请号：US18418246

申请日：2024-01-20

Applicant: Google LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/07 , G10L15/16

CPC classification number: G10L15/005 , G10L15/07 , G10L15/16 , G10L2015/0631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

20.

发明公开
SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS 审中-公开

公开(公告)号：US20240112667A1

公开(公告)日：2024-04-04

申请号：US18525475

申请日：2023-11-30

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00

CPC classification number: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification