-
公开(公告)号:US20220217071A1
公开(公告)日:2022-07-07
申请号:US17702652
申请日:2022-03-23
Applicant: Intel Corporation
Inventor: Gengbin ZHENG , Maria GARZARAN
Abstract: Methods and apparatus for efficient topology-aware tree search algorithm for a broadcast operation. A broadcast tree for a broadcast operation in a network having a hierarchical structure including nodes logically partitioned at group and switch levels. Lists of visited nodes (vnodes) and unvisited nodes (unodes) are initialized. Beginning at a root node, search iterations are performed in a progressive manner to build the tree, wherein a given search iteration finds a unode that can be reached earliest from a vnode, moves the unode that is found from the unode list to the vnode list and adds new unodes to the unode list based on the location of the unode. Beginning with the switch the root node is connected to, the algorithm progressively adds nodes from other switches in the root group and then from other groups and switches within those other groups and continues until all nodes have been visited.
-
2.
公开(公告)号:US20190213146A1
公开(公告)日:2019-07-11
申请号:US16353759
申请日:2019-03-14
Applicant: Intel Corporation
Inventor: Nusrat ISLAM , Gengbin ZHENG , Sayantan SUR , Maria GARZARAN , Akhil LANGER
IPC: G06F13/16 , G06F16/901
CPC classification number: G06F13/16 , G06F16/9024 , G06F2213/16
Abstract: Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex.
-