EFFICIENT TOPOLOGY-AWARE TREE SEARCH ALGORITHM FOR A BROADCAST OPERATION

    公开(公告)号:US20220217071A1

    公开(公告)日:2022-07-07

    申请号:US17702652

    申请日:2022-03-23

    Abstract: Methods and apparatus for efficient topology-aware tree search algorithm for a broadcast operation. A broadcast tree for a broadcast operation in a network having a hierarchical structure including nodes logically partitioned at group and switch levels. Lists of visited nodes (vnodes) and unvisited nodes (unodes) are initialized. Beginning at a root node, search iterations are performed in a progressive manner to build the tree, wherein a given search iteration finds a unode that can be reached earliest from a vnode, moves the unode that is found from the unode list to the vnode list and adds new unodes to the unode list based on the location of the unode. Beginning with the switch the root node is connected to, the algorithm progressively adds nodes from other switches in the root group and then from other groups and switches within those other groups and continues until all nodes have been visited.

    MINIMIZING USAGE OF HARDWARE COUNTERS IN TRIGGERED OPERATIONS FOR COLLECTIVE COMMUNICATION

    公开(公告)号:US20190213146A1

    公开(公告)日:2019-07-11

    申请号:US16353759

    申请日:2019-03-14

    CPC classification number: G06F13/16 G06F16/9024 G06F2213/16

    Abstract: Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex.

Patent Agency Ranking