Patent search ap:("GOOGLE LLC") AND inv:"Zhifeng Chen" Page 6

51.

发明授权
Implicit bridging of machine learning tasks 有权

公开(公告)号：US10679148B2

公开(公告)日：2020-06-09

申请号：US16402787

申请日：2019-05-03

Applicant: Google LLC

Inventor： Zhifeng Chen , Michael Schuster , Melvin Jose Johnson Premkumar , Yonghui Wu , Quoc V. Le , Maxim Krikun , Thorsten Brants

IPC: G06N20/00 , G06N3/04 , G06N3/063 , G06F40/44 , G06F40/47

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model. An exemplary system applying implicit bridging for machine learning tasks, as described in this specification, trains a machine learning model to perform certain types of machine learning tasks without requiring explicit training data for the certain types of machine learning tasks to be used during training.

52.

发明申请
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION 审中-公开

公开(公告)号：US20200160836A1

公开(公告)日：2020-05-21

申请号：US16684483

申请日：2019-11-14

Applicant: GOOGLE LLC

Inventor： Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen

IPC: G10L15/00 , G10L15/07 , G10L15/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

53.

发明授权
Synthesis of speech from text in a voice of a target speaker using neural networks 有权

公开(公告)号：US12175963B2

公开(公告)日：2024-12-24

申请号：US18525475

申请日：2023-11-30

Applicant: Google LLC

Inventor： Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick An Phu Nguyen

IPC: G10L13/04 , G10L17/04 , G10L19/00 , G06N3/08 , G10L13/02

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

54.

发明申请
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 有权

公开(公告)号：US20240404506A1

公开(公告)日：2024-12-05

申请号：US18797760

申请日：2024-08-08

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

55.

发明公开
STREAMING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF 审中-公开

公开(公告)号：US20240311402A1

公开(公告)日：2024-09-19

申请号：US18136634

申请日：2023-04-19

Applicant: GOOGLE LLC

Inventor： Martin Baeuml , Yanping Huang , Wenhao Jia , Chang Lan , Yuanzhong Xu , Junwhan Ahn , Alexander Bailey , Leif Schelin , Trevor Strohman , Emanuel Taropa , Sidharth Mudgal , Yanyan Zheng , Zhifeng Chen , Ahmad Beirami

IPC: G06F16/332 , G06F40/40

CPC classification number: G06F16/3322 , G06F16/3329 , G06F40/40

Abstract: Implementations relate to reducing latency in generating and/or rendering natural language (NL) output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, and generate the NL based output utilizing the LLM. The NL based output can be a stream of NL based output in that it includes a plurality of segments, and is generated on a segment-by-segment basis. In some implementations, a first segment of the stream of NL based output is selected for inclusion in the stream of NL based output as a second segment (and any subsequent segment) is being generated to reduce latency in evaluating the NL based output as a whole prior to rendering thereof. In some versions of those implementations, the first segment is rendered as the second segment (and any subsequent segment) is being generated to further reduce latency in rendering thereof.

56.

发明公开
MACHINE TRANSLATION USING NEURAL NETWORK MODELS 审中-公开

公开(公告)号：US20240020491A1

公开(公告)日：2024-01-18

申请号：US18374071

申请日：2023-09-28

Applicant: Google LLC

Inventor： Zhifeng Chen , Macduff Richard Hughes , Yonghui Wu , Michael Schuster , Xu Chen , Llion Owen Jones , Niki J. Parmar , George Foster , Orhan Firat , Ankur Bapna , Wolfgang Macherey , Melvin Jose Johnson Premkumar

IPC: G06F40/58 , G06N3/08

CPC classification number: G06F40/58 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.

57.

发明授权
Machine translation using neural network models 有权

公开(公告)号：US11809834B2

公开(公告)日：2023-11-07

申请号：US17459041

申请日：2021-08-27

Applicant: Google LLC

Inventor： Zhifeng Chen , Macduff Richard Hughes , Yonghui Wu , Michael Schuster , Xu Chen , Llion Owen Jones , Niki J. Parmar , George Foster , Orhan Firat , Ankur Bapna , Wolfgang Macherey , Melvin Jose Johnson Premkumar

IPC: G06F40/58 , G06N3/08

CPC classification number: G06F40/58 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.

58.

发明公开
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 审中-公开

公开(公告)号：US20230178068A1

公开(公告)日：2023-06-08

申请号：US18161217

申请日：2023-01-30

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

CPC classification number: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

59.

发明授权
Minimum word error rate training for attention-based sequence-to-sequence models 有权

公开(公告)号：US11646019B2

公开(公告)日：2023-05-09

申请号：US17443557

申请日：2021-07-27

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick An Phu Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Patricia Kannan

IPC: G10L15/197 , G10L15/16 , G10L15/06 , G10L15/02 , G10L15/22

CPC classification number: G10L15/197 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/22 , G10L2015/025

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

60.

发明申请
MACHINE LEARNING MODELS FOR BEHAVIOR UNDERSTANDING 有权

公开(公告)号：US20220383076A1

公开(公告)日：2022-12-01

申请号：US17828778

申请日：2022-05-31

Applicant: Google LLC

Inventor： Jonathon Shlens , Vijay Vasudevan , Jiquan Ngiam , Benjamin James Caine , Zhengdong Zhang , Zhifeng Chen , Hao-Tien Chiang , David Joseph Weiss , Jeffrey Ling , Ashish Venugopal

IPC: G06N3/04

Abstract: A method for performing one or more tasks, wherein each of the one or more tasks includes predicting behavior of one or more agents in an environment, the method comprising: obtaining a three-dimensional (3D) input tensor representing behaviors of the one or more agents in the environment across a plurality of time steps; generating an encoded representation of the 3D input tensor by processing the 3D input tensor using an encoder neural network, wherein 3D input tensor comprises a plurality of observed cells and a plurality of masked cells; and processing the encoded representation of the 3D input tensor using a decoder neural network to generate a 4D output tensor.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification