-
公开(公告)号:US20220121945A1
公开(公告)日:2022-04-21
申请号:US17567740
申请日:2022-01-03
申请人: Google LLC
发明人: Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.
-
公开(公告)号:US11144831B2
公开(公告)日:2021-10-12
申请号:US16906034
申请日:2020-06-19
申请人: Google LLC
发明人: Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real
摘要: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.
-
3.
公开(公告)号:US20240311405A1
公开(公告)日:2024-09-19
申请号:US18337316
申请日:2023-06-19
申请人: GOOGLE LLC
发明人: Seungyeon Kim , Ankit Singh Rawat , Wittawat Jitkrittum , Hari Narasimhan , Sashank Reddi , Neha Gupta , Srinadh Bhojanapalli , Aditya Menon , Manzil Zaheer , Tal Schuster , Sanjiv Kumar , Toby Boyd , Zhifeng Chen , Emanuel Taropa , Vikram Kasivajhula , Trevor Strohman , Martin Baeuml , Leif Schelin , Yanping Huang
IPC分类号: G06F16/332
CPC分类号: G06F16/3329
摘要: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified. This, in turn, can mitigate occurrences of computational and/or network inefficiencies that result from a user issuing a follow-up request to cure the inaccuracies and/or under-specification of a generated response.
-
公开(公告)号:US20240112027A1
公开(公告)日:2024-04-04
申请号:US18477546
申请日:2023-09-28
申请人: Google LLC
发明人: Yanqi Zhou , Yanping Huang , Yifeng Lu , Andrew M. Dai , Siamak Shakeri , Zhifeng Chen , James Laudon , Quoc V. Le , Da Huang , Nan Du , David Richard So , Daiyi Peng , Yingwei Cui , Jeffrey Adgate Dean , Chang Lan
IPC分类号: G06N3/08
CPC分类号: G06N3/08
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing neural architecture search for machine learning models. In one aspect, a method comprises receiving training data for a machine learning, generating a plurality of candidate neural networks for performing the machine learning task, wherein each candidate neural network comprises a plurality of instances of a layer block composed of a plurality of layers, for each candidate neural network, selecting a respective type for each of the plurality of layers from a set of layer types that comprises, training the candidate neural network and evaluating performance scores for the trained candidate neural networks as applied to the machine learning task, and determining a final neural network for performing the machine learning task based at least on the performance scores for the candidate neural networks.
-
公开(公告)号:US20230222318A1
公开(公告)日:2023-07-13
申请号:US18009841
申请日:2021-06-30
申请人: Google LLC
发明人: Dmitry Lepikhin , Yanping Huang , Orhan Firat , Maxim Krikun , Dehao Chen , Noam M. Shazeer , HyoukJoong Lee , Yuanzhong Xu , Zhifeng Chen
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.
-
公开(公告)号:US20200320399A1
公开(公告)日:2020-10-08
申请号:US16906034
申请日:2020-06-19
申请人: Google LLC
发明人: Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real
摘要: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.
-
公开(公告)号:US20240330334A1
公开(公告)日:2024-10-03
申请号:US18225990
申请日:2023-07-25
申请人: GOOGLE LLC
发明人: Sidharth Mudgal , Ahmad Beirami , Jilin Chen , Alex Beutel , Harish Ganapathy , YaGuang Li , Tao Wang , Yanping Huang , Trevor Strohman
IPC分类号: G06F16/332 , G06F40/284
CPC分类号: G06F16/3329 , G06F40/284
摘要: Implementations relate to reducing latency in generating and/or rendering a given stream of natural language (NL) based output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, generate the stream of NL based output utilizing the LLM that is responsive to the NL based input and that is for a given dialog context of an ongoing dialog, and cause the stream of NL based output to be rendered at the client device. Notably, the processor(s) can employ attribute classifier(s) and a multi-objective scorer to implement a blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM. By implementing the blockwise controlled decoding technique in generating the stream of NL based output utilizing the LLM, the processor(s) can reduce latency in generating and/or of the stream of NL based output generated utilizing the LLM.
-
公开(公告)号:US20230259784A1
公开(公告)日:2023-08-17
申请号:US18140442
申请日:2023-04-27
申请人: Google LLC
发明人: Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real
摘要: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.
-
公开(公告)号:US11669744B2
公开(公告)日:2023-06-06
申请号:US17475137
申请日:2021-09-14
申请人: Google LLC
发明人: Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real
摘要: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.
-
公开(公告)号:US20220004879A1
公开(公告)日:2022-01-06
申请号:US17475137
申请日:2021-09-14
申请人: Google LLC
发明人: Yanping Huang , Alok Aggarwal , Quoc V. Le , Esteban Alberto Real
摘要: A method for receiving training data for training a neural network (NN) to perform a machine learning (ML) task and for determining, using the training data, an optimized NN architecture for performing the ML task is described. Determining the optimized NN architecture includes: maintaining population data comprising, for each candidate architecture in a population of candidate architectures, (i) data defining the candidate architecture, and (ii) data specifying how recently a neural network having the candidate architecture has been trained while determining the optimized neural network architecture; and repeatedly performing multiple operations using each of a plurality of worker computing units to generate a new candidate architecture based on a selected candidate architecture having the best measure of fitness, adding the new candidate architecture to the population, and removing from the population the candidate architecture that was trained least recently.
-
-
-
-
-
-
-
-
-