Invention Application
- Patent Title: TRAINING NEURAL NETWORKS USING DISTRIBUTED BATCH NORMALIZATION
-
Application No.: US18444267Application Date: 2024-02-16
-
Publication No.: US20240378416A1Publication Date: 2024-11-14
- Inventor: Blake Alan Hechtman , Sameer Kumar
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Main IPC: G06N3/044
- IPC: G06N3/044 ; G06N3/04 ; G06N3/08 ; G06N3/084 ; G06V10/82

Abstract:
Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.
Information query