Scalable in-network computation for massively-parallel shared-memory processors
摘要:
A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.
信息查询
0/0