Patent search ap:("GOOGLE LLC") AND inv:"Yanping Huang" Page 1

1.

发明申请
LARGE LANGUAGE MODEL (LLM) QUANTIZATION 有权

公开(公告)号：US20240428006A1

公开(公告)日：2024-12-26

申请号：US18211967

申请日：2023-06-20

Applicant: GOOGLE LLC

Inventor： Jian Li , Zhifeng Chen , Yanping Huang , Yuanzhong Xu , Tao Wang , YaGuang Li

IPC: G06F40/40

Abstract: Implementations relate to asymmetric quantization of large language models (LLMs). Processor(s) of a system can: obtain a trained LLM, wherein the trained LLM includes a plurality of layers, each layer comprising a respective plurality of weights; for each layer of the plurality of layers: calculate an optimal clipping range for the respective plurality of weights, and clip one or more weights of the respective plurality of weights that lie outside of the optimal clipping range to produce a clipped layer; quantize the LLM to generate a quantized LLM, wherein the instructions to quantize include instructions to map weights of the plurality of clipped layers of the LLM from continuous values to discrete values; and provide the quantized LLM for downstream processing.

2.

发明授权
Regularized neural network architecture search 有权

公开(公告)号：US11144831B2

公开(公告)日：2021-10-12

申请号：US16906034

申请日：2020-06-19

Applicant: Google LLC

Inventor： Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real

IPC: G06F15/18 , G06N3/00 , G06N3/12 , G06N3/08 , G06N3/04

Abstract: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.

3.

发明授权
Systems and methods for routing within multitask mixture-of-experts models 有权

公开(公告)号：US12242948B2

公开(公告)日：2025-03-04

申请号：US17159437

申请日：2021-01-27

Applicant: Google LLC

Inventor： Yanping Huang , Dmitry Lepikhin , Maxim Krikun , Orhan Firat , Ankur Bapna , Thang Luong , Sneha Kudugunta

IPC: G06N3/045 , G06N3/08

Abstract: Systems and methods for routing in mixture-of-expert models. In some aspects of the technology, a transformer may have at least one Mixture-of-Experts (“MoE”) layer in each of its encoder and decoder, with the at least one MoE layer of the encoder having a learned gating function configured to route each token of a task to two or more selected expert feed-forward networks, and the at least one MoE layer of the decoder having a learned gating function configured to route each task to two or more selected expert feed-forward networks.

4.

发明申请
TRAINING OF LARGE NEURAL NETWORKS 有权

公开(公告)号：US20240378441A1

公开(公告)日：2024-11-14

申请号：US18661447

申请日：2024-05-10

Applicant: Google LLC

Inventor： Slav Petrov , Yonghui Wu , Andrew M. Dai , David Richard So , Dmitry Lepikhin , Erica Ann Moreira , Gaurav Mishra , Jonathan Hudson Clark , Maxim Krikun , Melvin Jose Johnson Premkumar , Nan Du , Orhan Firat , Rohan Anil , Siamak Shakeri , Xavier Garcia , Yanping Huang , Yong Cheng , Yuanzhong Xu , Yujing Zhang , Zachary Alexander Nado , Eric Jun Jie Ni , Kefan Xiao , Vladimir Feinberg , Jin Young Sohn , Aurko Roy

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.

5.

发明申请
TRAINING GIANT NEURAL NETWORKS USING PIPELINE PARALLELISM 有权

公开(公告)号：US20220121945A1

公开(公告)日：2022-04-21

申请号：US17567740

申请日：2022-01-03

Applicant: Google LLC

Inventor： Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam

IPC: G06N3/08 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.

6.

发明公开
DYNAMIC SELECTION FROM AMONG MULTIPLE CANDIDATE GENERATIVE MODELS WITH DIFFERING COMPUTATIONAL EFFICIENCIES 审中-公开

公开(公告)号：US20240311405A1

公开(公告)日：2024-09-19

申请号：US18337316

申请日：2023-06-19

Applicant: GOOGLE LLC

Inventor： Seungyeon Kim , Ankit Singh Rawat , Wittawat Jitkrittum , Hari Narasimhan , Sashank Reddi , Neha Gupta , Srinadh Bhojanapalli , Aditya Menon , Manzil Zaheer , Tal Schuster , Sanjiv Kumar , Toby Boyd , Zhifeng Chen , Emanuel Taropa , Vikram Kasivajhula , Trevor Strohman , Martin Baeuml , Leif Schelin , Yanping Huang

IPC: G06F16/332

CPC classification number: G06F16/3329

Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified. This, in turn, can mitigate occurrences of computational and/or network inefficiencies that result from a user issuing a follow-up request to cure the inaccuracies and/or under-specification of a generated response.

7.

发明公开
NEURAL NETWORK ARCHITECTURE SEARCH OVER COMPLEX BLOCK ARCHITECTURES 审中-公开

公开(公告)号：US20240112027A1

公开(公告)日：2024-04-04

申请号：US18477546

申请日：2023-09-28

Applicant: Google LLC

Inventor： Yanqi Zhou , Yanping Huang , Yifeng Lu , Andrew M. Dai , Siamak Shakeri , Zhifeng Chen , James Laudon , Quoc V. Le , Da Huang , Nan Du , David Richard So , Daiyi Peng , Yingwei Cui , Jeffrey Adgate Dean , Chang Lan

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing neural architecture search for machine learning models. In one aspect, a method comprises receiving training data for a machine learning, generating a plurality of candidate neural networks for performing the machine learning task, wherein each candidate neural network comprises a plurality of instances of a layer block composed of a plurality of layers, for each candidate neural network, selecting a respective type for each of the plurality of layers from a set of layer types that comprises, training the candidate neural network and evaluating performance scores for the trained candidate neural networks as applied to the machine learning task, and determining a final neural network for performing the machine learning task based at least on the performance scores for the candidate neural networks.

8.

发明公开
ATTENTION NEURAL NETWORKS WITH CONDITIONAL COMPUTATION 审中-公开

公开(公告)号：US20230222318A1

公开(公告)日：2023-07-13

申请号：US18009841

申请日：2021-06-30

Applicant: Google LLC

Inventor： Dmitry Lepikhin , Yanping Huang , Orhan Firat , Maxim Krikun , Dehao Chen , Noam M. Shazeer , HyoukJoong Lee , Yuanzhong Xu , Zhifeng Chen

IPC: G06N3/042 , G06N3/098

CPC classification number: G06N3/042 , G06N3/098

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.

9.

发明申请
REGULARIZED NEURAL NETWORK ARCHITECTURE SEARCH 审中-公开

公开(公告)号：US20200320399A1

公开(公告)日：2020-10-08

申请号：US16906034

申请日：2020-06-19

Applicant: Google LLC

Inventor： Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real

IPC: G06N3/08 , G06N3/04

Abstract: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.

10.

发明公开
BLOCKWISE CONTROLLED DECODING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF 审中-公开

公开(公告)号：US20240330334A1

公开(公告)日：2024-10-03

申请号：US18225990

申请日：2023-07-25

Applicant: GOOGLE LLC

Inventor： Sidharth Mudgal , Ahmad Beirami , Jilin Chen , Alex Beutel , Harish Ganapathy , YaGuang Li , Tao Wang , Yanping Huang , Trevor Strohman

IPC: G06F16/332 , G06F40/284

CPC classification number: G06F16/3329 , G06F40/284

Abstract: Implementations relate to reducing latency in generating and/or rendering a given stream of natural language (NL) based output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, generate the stream of NL based output utilizing the LLM that is responsive to the NL based input and that is for a given dialog context of an ongoing dialog, and cause the stream of NL based output to be rendered at the client device. Notably, the processor(s) can employ attribute classifier(s) and a multi-objective scorer to implement a blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM. By implementing the blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM, the processor(s) can reduce latency in generating and/or of the stream of NL based output generated utilizing the LLM.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification