-
公开(公告)号:US20240378416A1
公开(公告)日:2024-11-14
申请号:US18444267
申请日:2024-02-16
Applicant: Google LLC
Inventor: Blake Alan Hechtman , Sameer Kumar
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.
-
公开(公告)号:US11907825B2
公开(公告)日:2024-02-20
申请号:US16659543
申请日:2019-10-21
Applicant: Google LLC
Inventor: Blake Alan Hechtman , Sameer Kumar
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.
-
公开(公告)号:US11715010B2
公开(公告)日:2023-08-01
申请号:US16543410
申请日:2019-08-16
Applicant: Google LLC
Inventor: Bjarke Hammersholt Roune , Sameer Kumar , Norman Paul Jouppi
IPC: G06N3/084 , G06N20/00 , G06F18/2115 , G06F18/23 , G06F18/214
CPC classification number: G06N3/084 , G06F18/2115 , G06F18/2148 , G06F18/23 , G06N20/00
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.
-
公开(公告)号:US20220292399A1
公开(公告)日:2022-09-15
申请号:US17637200
申请日:2020-09-04
Applicant: Google LLC
Inventor: Bjarke Hammersholt Roune , Sameer Kumar
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.
-
公开(公告)号:US20210049408A1
公开(公告)日:2021-02-18
申请号:US16543410
申请日:2019-08-16
Applicant: Google LLC
Inventor: Bjarke Hammersholt Roune , Sameer Kumar , Norman Paul Jouppi
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.
-
-
-
-