-
公开(公告)号:US12189717B1
公开(公告)日:2025-01-07
申请号:US17105998
申请日:2020-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Can Karakus , Rahul Raghavendra Huilgol , Anirudh Subramanian , Fei Wu , Christopher Cade Daniel , Akhil Mehra , Ajay Paidi , Yutong Zhang , Indu Thangakrishnan , Luis Alves Pereira Quintela
Abstract: Automatic partitioning of a machine learning model may be performed for training across multiple processing devices. A training job for a machine learning model may specify a number of partitions for a machine learning model. An optimization parameter may be determined for the machine learning model. Different partitions of the machine learning model to be trained across multiple processing devices may be determined based on the specified number of partitions and the optimization parameter. A schedule for executing the training job may be generated according to the respective partitions of the machine learning model. The training job may be executed according to the schedule.