Patent search ap:("Google LLC") AND inv:"Sameer Kumar" Page 1

1.

发明申请
TRAINING NEURAL NETWORKS USING DISTRIBUTED BATCH NORMALIZATION 有权

公开(公告)号：US20240378416A1

公开(公告)日：2024-11-14

申请号：US18444267

申请日：2024-02-16

Applicant: Google LLC

Inventor： Blake Alan Hechtman , Sameer Kumar

IPC: G06N3/044 , G06N3/04 , G06N3/08 , G06N3/084 , G06V10/82

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.

2.

发明授权
Training neural networks using distributed batch normalization 有权

公开(公告)号：US11907825B2

公开(公告)日：2024-02-20

申请号：US16659543

申请日：2019-10-21

Applicant: Google LLC

Inventor： Blake Alan Hechtman , Sameer Kumar

IPC: G06N3/04 , G06N3/084 , G06N3/044 , G06N3/08 , G06V10/82

CPC classification number: G06N3/044 , G06N3/04 , G06N3/08 , G06N3/084 , G06V10/82

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.

3.

发明授权
Cross replica reduction on networks having degraded nodes 有权

公开(公告)号：US11715010B2

公开(公告)日：2023-08-01

申请号：US16543410

申请日：2019-08-16

Applicant: Google LLC

Inventor： Bjarke Hammersholt Roune , Sameer Kumar , Norman Paul Jouppi

IPC: G06N3/084 , G06N20/00 , G06F18/2115 , G06F18/23 , G06F18/214

CPC classification number: G06N3/084 , G06F18/2115 , G06F18/2148 , G06F18/23 , G06N20/00

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.

4.

发明申请
PROCESSING OF REDUCTION AND BROADCAST OPERATIONS ON LARGE DATASETS WITH MUTLI-DIMENSIONAL HARDWARE ACCELERATORS 有权

公开(公告)号：US20220292399A1

公开(公告)日：2022-09-15

申请号：US17637200

申请日：2020-09-04

Applicant: Google LLC

Inventor： Bjarke Hammersholt Roune , Sameer Kumar

IPC: G06N20/00 , G06K9/62

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.

5.

发明申请
CROSS REPLICA REDUCTION ON NETWORKS HAVING DEGRADED NODES 有权

公开(公告)号：US20210049408A1

公开(公告)日：2021-02-18

申请号：US16543410

申请日：2019-08-16

Applicant: Google LLC

Inventor： Bjarke Hammersholt Roune , Sameer Kumar , Norman Paul Jouppi

IPC: G06K9/62 , G06N20/00 , H04L29/08

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.

Patent Agency Ranking