-
公开(公告)号:US12277495B1
公开(公告)日:2025-04-15
申请号:US17301320
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Thiam Khean Hah , Yongseok Koh
Abstract: Systems and methods are disclosed to perform gradient exchange among processing nodes configured as a hyper-rectangle network of N-dimensions. Each processing node can operate as a collective parameter server node capable to perform collective compute operations. For each dimension in a sequence of dimensions, all processing nodes on a same edge can perform a scatter-reduce operation using respective collective parameter serving engines. The amount of data reduced in each dimension is an inverse of a number of processing nodes in that dimension. After the scatter-reduce operation is performed for all the dimensions, all processing nodes on the same edge can perform an all-gather operation for each dimension in a reverse sequence of dimensions.