-
公开(公告)号:US20220121945A1
公开(公告)日:2022-04-21
申请号:US17567740
申请日:2022-01-03
申请人: Google LLC
发明人: Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.
-
公开(公告)号:US20230222318A1
公开(公告)日:2023-07-13
申请号:US18009841
申请日:2021-06-30
申请人: Google LLC
发明人: Dmitry Lepikhin , Yanping Huang , Orhan Firat , Maxim Krikun , Dehao Chen , Noam M. Shazeer , HyoukJoong Lee , Yuanzhong Xu , Zhifeng Chen
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing machine learning task on a network input to generate a network output. In one aspect, one of the systems includes an attention neural network configured to perform the machine learning task, the attention neural network including one or more attention layers, each attention layer comprising an attention sub-layer and a feed-forward sub-layer. Some or all of the attention layers have a feed-forward sub-layer that applies conditional computation to the inputs to the sub-layer.
-
公开(公告)号:US20210042620A1
公开(公告)日:2021-02-11
申请号:US16989787
申请日:2020-08-10
申请人: Google LLC
发明人: Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.
-
公开(公告)号:US11232356B2
公开(公告)日:2022-01-25
申请号:US16989787
申请日:2020-08-10
申请人: Google LLC
发明人: Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.
-
-
-