Abstract:
Systems, apparatuses and methods may provide for detecting an outbound communication and identifying a context of the outbound communication. Additionally, a completion status of the outbound communication may be tracked relative to the context. In one example, tracking the completion status includes incrementing a sent messages counter associated with the context in response to the outbound communication, detecting an acknowledgement of the outbound communication based on a network response to the outbound communication, incrementing a received acknowledgements counter associated with the context in response to the acknowledgement, comparing the sent messages counter to the received acknowledgements counter, and triggering a per-context memory ordering operation if the sent messages counter and the received acknowledgements counter have matching values.
Abstract:
Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.
Abstract:
Technologies for improving throughput in a network include a node switch. The node switch is to obtain expected performance data indicative of an expected data transfer performance of the node switch. The node switch is also to obtain measured performance data indicative of a measured data transfer performance of the node switch, compare the measured performance data to the expected performance data to determine whether the measured data transfer performance satisfies the expected data transfer performance, determine, as a function of whether the measured data transfer performance satisfies the expected data transfer performance, whether to force a unit of data through a non-minimal path to a destination, and send, in response to a determination to force the unit of data to be sent through a non-minimal path, the unit of data to an output port of the node switch associated with the non-minimal path. Other embodiments are also described.
Abstract:
Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.
Abstract:
Systems, apparatuses and methods may provide for detecting an outbound communication and identifying a context of the outbound communication. Additionally, a completion status of the outbound communication may be tracked relative to the context. In one example, tracking the completion status includes incrementing a sent messages counter associated with the context in response to the outbound communication, detecting an acknowledgement of the outbound communication based on a network response to the outbound communication, incrementing a received acknowledgements counter associated with the context in response to the acknowledgement, comparing the sent messages counter to the received acknowledgements counter, and triggering a per-context memory ordering operation if the sent messages counter and the received acknowledgements counter have matching values.
Abstract:
Technologies for improving throughput in a network include a node switch. The node switch is to obtain expected performance data indicative of an expected data transfer performance of the node switch. The node switch is also to obtain measured performance data indicative of a measured data transfer performance of the node switch, compare the measured performance data to the expected performance data to determine whether the measured data transfer performance satisfies the expected data transfer performance, determine, as a function of whether the measured data transfer performance satisfies the expected data transfer performance, whether to force a unit of data through a non-minimal path to a destination, and send, in response to a determination to force the unit of data to be sent through a non-minimal path, the unit of data to an output port of the node switch associated with the non-minimal path. Other embodiments are also described.
Abstract:
Technologies for densely packaging network components for large scale indirect topologies include group of switches. The group of switches includes a stack of node switches that includes a first set of ports and a stack of global switches that includes a second set of ports. The stack of node switches are oriented orthogonally to the stack of global switches. Additionally, the first set of ports are oriented towards the second set of ports and the node switches are connected to the global switches through the first and second sets of ports. Other embodiments are also described and claimed.
Abstract:
Technologies for adaptive routing based on aggregated congestion information include a network switch that includes a plurality of output ports. The network switch is configured to determine a maximum local occupancy count for each output port based on a maximum local occupancy count of output buffer queues of each output port, a local congestion value based on the maximum local occupancy count, and a remote congestion value for a corresponding remote input buffer queue of a remote computing device communicatively coupled to a corresponding output port. The network switch is further configured to determine, for each output port, a total congestion value as a function of the local congestion value and the remote congestion value and enqueue the network packet into one of the output buffer queues of one of the output ports based on the total congestion values of the output ports. Other embodiments are described herein.
Abstract:
Technologies for handling message passing interface receive operations include a compute node to determine a plurality of parameters of a receive entry to be posted and determine whether the plurality of parameters includes a wildcard entry. The compute node generates a hash based on at least one parameter of the plurality of parameters in response to determining that the plurality of parameters does not include the wildcard entry and appends the receive entry to a list in a bin of a posted receive data structure, wherein the bin is determined based on the generated hash. The compute node further tracks the wildcard entry in the posted receive data structure in response to determining the plurality of parameters includes the wildcard entry and appends the receive entry to a wildcard list of the posted receive data structure in response to tracking the wildcard entry.
Abstract:
Technologies for one-side remote memory access communication include multiple computing nodes in communication over a network. A receiver computing node receives a message from a sender node and extracts a segment identifier from the message. The receiver computing node determines, based on the segment identifier, a segment start address associated with a partitioned global address space (PGAS) segment of its local memory. The receiver computing node may index a segment table stored in the local memory or in a host fabric interface. The receiver computing node determines a local destination address within the PGAS segment based on the segment start address and an offset included in the message. The receiver computing node performs a remote memory access operation at the local destination address. The receiver computing node may perform those operations in hardware by the host fabric interface of the receiver computing node. Other embodiments are described and claimed.