Abstract:
A network device in a communication network includes a controller and processing circuitry. The controller is configured to manage execution of an operation whose execution depends on inputs from a group of one or more work-request initiators. The processing circuitry is configured to read one or more values, which are set by the work-request initiators in one or more memory locations that are accessible to the work-request initiators and to the network device, and to trigger execution of the operation in response to verifying that the one or more values read from the one or more memory locations indicate that the work-request initiators in the group have provided the respective inputs.
Abstract:
A network device in a communication network includes a controller and processing circuitry. The controller is configured to manage execution of an operation whose execution depends on inputs from a group of one or more work-request initiators. The processing circuitry is configured to read one or more values, which are set by the work-request initiators in one or more memory locations that are accessible to the work-request initiators and to the network device, and to trigger execution of the operation in response to verifying that the one or more values read from the one or more memory locations indicate that the work-request initiators in the group have provided the respective inputs.
Abstract:
A switch in a data network is configured to mediate data exchanges among network elements. The apparatus further includes a processor, which organizes the network elements into a hierarchical tree having a root node network element, vertex node network elements child node network elements that include leaf node network elements. The leaf node network elements are originate aggregation data and transmit the aggregation data to respective parent vertex node network elements. The vertex node network elements combine the aggregation data from at least a portion of the child node network elements, and transmit the combined aggregation data from the vertex node network elements to parent vertex node network elements. The root node network element is operative for initiating a reduction operation on the aggregation data.
Abstract:
A method for network access of remote memory directly from a local instruction stream using conventional loads and stores. In cases where network IO access (a network phase) cannot overlap a compute phase, a direct network access from the instruction stream greatly decreases latency in CPU processing. The network is treated as yet another memory that can be directly read from, or written to, by the CPU. Network access can be done directly from the instruction stream using regular loads and stores. Example scenarios where synchronous network access can be beneficial are SHMEM (symmetric hierarchical memory access) usages (where the program directly reads/writes remote memory), and scenarios where part of system memory (for example DDR) can reside over a network and made accessible by demand to different CPUs.
Abstract:
A network device includes a first interface, a second interface, and circuitry. The first interface is configured to communicate at least with a memory. The second interface is configured to communicate over a network with a peer network device. The circuitry is configured to receive a request to transfer data over the network between the memory and the peer network device in accordance with (i) a pattern of offsets to be accessed in the memory and (ii) a memory key representing a memory space to be accessed using the pattern, and to transfer the data in accordance with the request.
Abstract:
A network interface device for a host computer includes a network interface, configured to transmit and receive data packets to and from a network. Packet processing logic transfers data to and from the data packets transmitted and received via the network interface by direct memory access (DMA) from and to a system memory of the host computer. A memory controller includes a first memory interface configured to be connected to the system memory and a second memory interface, configured to be connected to a host complex of the host computer. Switching logic alternately couples the first memory interface to the packet processing logic in a DMA configuration and to the second memory interface in a pass-through configuration.
Abstract:
An all-to-all communication operation which is carried out in a fabric of networked entities by defining in each of the entities a plurality of memory regions of contiguous memory addresses holding messages therein, and exchanging the messages repeatedly with all the other entities. Relatively small messages are copied using a CPU and larger messages are transmitted using scatter/gather facilities.
Abstract:
A method for network access of remote memory directly from a local instruction stream using conventional loads and stores. In cases where network IO access (a network phase) cannot overlap a compute phase, a direct network access from the instruction stream greatly decreases latency in CPU processing. The network is treated as yet another memory that can be directly read from, or written to, by the CPU. Network access can be done directly from the instruction stream using regular loads and stores. Example scenarios where synchronous network access can be beneficial are SHMEM (symmetric hierarchical memory access) usages (where the program directly reads/writes remote memory), and scenarios where part of system memory (for example DDR) can reside over a network and made accessible by demand to different CPUs.
Abstract:
A network interface device for a host computer includes a network interface, configured to transmit and receive data packets to and from a network. Packet processing logic transfers data to and from the data packets transmitted and received via the network interface by direct memory access (DMA) from and to a system memory of the host computer. A memory controller includes a first memory interface configured to be connected to the system memory and a second memory interface, configured to be connected to a host complex of the host computer. Switching logic alternately couples the first memory interface to the packet processing logic in a DMA configuration and to the second memory interface in a pass-through configuration.
Abstract:
A method for communication includes posting, by a software process, a set of buffers in a memory of a host processor and creating in the memory a list of labels associated respectively with the buffers. The software process pushes a first part of the list to a network interface controller (NIC), while retaining a second part of the list in the memory under control of the software process. Upon receiving a message containing a label, sent over a network, the NIC compares the label to the labels in the first part of the list and, upon finding a match to the label, writes data conveyed by the message to a buffer in the memory. Upon a failure to find the match in the first part of the list, the NIC passes the message from the NIC to the software process for handling using the second part of the list.