专利检索 ap:("Google LLC") AND inv:"HyoukJoong Lee" 第 1 页

1.

发明申请
TRAINING GIANT NEURAL NETWORKS USING PIPELINE PARALLELISM 有权

公开(公告)号：US20220121945A1

公开(公告)日：2022-04-21

申请号：US17567740

申请日：2022-01-03

申请人： Google LLC

发明人： Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam

IPC分类号： G06N3/08 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.

2.

发明公开
ATTENTION NEURAL NETWORKS WITH CONDITIONAL COMPUTATION 审中-公开

公开(公告)号：US20230222318A1

公开(公告)日：2023-07-13

申请号：US18009841

申请日：2021-06-30

申请人： Google LLC

发明人： Dmitry Lepikhin , Yanping Huang , Orhan Firat , Maxim Krikun , Dehao Chen , Noam M. Shazeer , HyoukJoong Lee , Yuanzhong Xu , Zhifeng Chen

IPC分类号： G06N3/042 , G06N3/098

CPC分类号： G06N3/042 , G06N3/098

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.

3.

发明申请
TRAINING GIANT NEURAL NETWORKS USING PIPELINE PARALLELISM 有权

公开(公告)号：US20210042620A1

公开(公告)日：2021-02-11

申请号：US16989787

申请日：2020-08-10

申请人： Google LLC

发明人： Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam

IPC分类号： G06N3/08 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.

4.

发明授权
Training giant neural networks using pipeline parallelism 有权

公开(公告)号：US11232356B2

公开(公告)日：2022-01-25

申请号：US16989787

申请日：2020-08-10

申请人： Google LLC

发明人： Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam

IPC分类号： G06N3/08 , G06N3/04

摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.