Invention Grant
- Patent Title: Dynamic network bandwidth in distributed deep learning training
-
Application No.: US16925192Application Date: 2020-07-09
-
Publication No.: US11886969B2Publication Date: 2024-01-30
- Inventor: Wei Zhang , Xiaodong Cui , Abdullah Kayi , Alper Buyuktosunoglu
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Donald J. O'Brien
- Main IPC: G06N20/20
- IPC: G06N20/20 ; H04L12/24 ; G06N3/02 ; H04L41/16

Abstract:
Embodiments of a method are disclosed. The method includes performing distributed deep learning training on a batch of training data. The method also includes determining training times representing an amount of time between a beginning batch time and an end batch time. Further, the method includes modifying a communication aspect of the communication straggler to reduce a future network communication time for the communication straggler to send a future result of the distributed deep learning training on a new batch of training data in response to the centralized parameter server determining that the learner is the communication straggler.
Public/Granted literature
- US20220012642A1 DYNAMIC NETWORK BANDWIDTH IN DISTRIBUTED DEEP LEARNING TRAINING Public/Granted day:2022-01-13
Information query