-
公开(公告)号:US20230315654A1
公开(公告)日:2023-10-05
申请号:US18250515
申请日:2020-11-30
Applicant: Intel Corporation
Inventor: Guokai Ma , Zhouhai Ye , Feng Zou , Xiaojie Deng
IPC: G06F13/16 , G06F15/173
CPC classification number: G06F13/1673 , G06F15/17375
Abstract: A method of performing ring allreduce operations is disclosed. The method includes sending a chunk of a message in a receive buffer at a current index of a send buffer to a next node in a virtual ring of nodes, receiving a chunk of the message from a previous node in the virtual ring of nodes and store the chunk at the current index of the receive buffer, and reducing a chunk in a send buffer at a previous index of the receive buffer and a chunk in the receive buffer at a previous index of the receive buffer and storing a result at the previous index of the receive buffer. The method includes repeating the sending, receiving and storing, and reducing and storing steps until all chunks of the message are reduced, and sending reduced chunks to the next node and receive reduced chunks from the previous node.
-
2.
公开(公告)号:US20240037378A1
公开(公告)日:2024-02-01
申请号:US18255391
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: Guokai Ma , Jiong Gong , Dhiraj Kalamkar , Rachitha Prem Seelin , Hongzhen Liu , Akshay Jain , Liangang Zhang
Abstract: Systems, apparatuses and methods may provide for technology that identifies an embedding table associated with a neural network. The neural network is associated with a plurality of compute nodes. The technology further identifies a number of entries of the embedding table, and determines whether to process gradients associated with the embedding table as dense gradients or sparse gradients based on the number of entries.
-