Abstract:
In one embodiment, an apparatus comprises a fabric controller of a first computing node. The fabric controller is to receive, from a second computing node via a network fabric that couples the first computing node to the second computing node, a request to execute a kernel on a field-programmable gate array (FPGA) of the first computing node; instruct the FPGA to execute the kernel; and send a result of the execution of the kernel to the second computing node via the network fabric.
Abstract:
Technologies for facilitating remote memory requests in accelerator devices are disclosed. The accelerator device includes circuitry to receive, from a kernel of the present accelerator device, a request through an application programming interface exposed to a high level software language in which the kernel of the present accelerator device is implemented, to establish a logical communication path between the kernel of the present accelerator device and a target accelerator device kernel, based on one or more physical communication paths. The communication protocol supported by the accelerator device may allow kernels operating on the accelerator device to send memory requests for memory locations at remote devices, with the communication protocol performing all of the operations necessary to carry out the memory request.
Abstract:
Described herein are devices and techniques for distributing application data. A device can communicate with one or more hardware switches. The device can receive, from a software stack, a multicast message including a constraint that indicates how application data is to be distributed. The constraint including a listing of the set of nodes and a number of nodes to which the application data is to be distributed. The device may receive, from the software stack, the application data for distribution to a plurality of nodes. The plurality of nodes being a subset of the set of nodes equaling the number of nodes. The device may select the plurality of nodes from the set of nodes. The device also may distribute a copy of the application data to the plurality of nodes based on the constraint. Also described are other embodiments.
Abstract:
In one embodiment, a processor includes a plurality of cores including a first core to be reserved for execution in a protected domain, the first core to be hidden from an operating system. The processor may further include a filter coupled to the plurality of cores, where the filter includes a plurality of fields each associated with one of the plurality of cores to indicate whether an interrupt of the protected domain is to be directed to the corresponding core. Other embodiments are described and claimed.
Abstract:
Technologies for providing efficient access to pooled accelerator devices include an accelerator sled. The accelerator sled includes an accelerator device and a controller connected to the accelerator device. The controller is to provide, to a compute sled, accelerator abstraction data. The accelerator abstraction data represents the accelerator device as one or more logical devices, each logical device having one or more memory regions accessible by the compute sled, and defines an access mode usable to access each corresponding memory region. The controller is further to receive, from the compute sled, a request to perform an operation on an identified memory region of the accelerator device with a corresponding access mode. Additionally, the controller is to convert the request from a first format to a second format that is different from the second format and is usable by the accelerator device to perform the operation. Additionally, the controller is to perform, in response to the request, the operation on the identified memory region of the accelerator device with the corresponding access mode. Other embodiments are also described and claimed.
Abstract:
Mechanisms to enable management controllers to learn the control plane hierarchy in data center environments. The data center is configured in a physical hierarchy including multiple pods, racks, trays, and sleds and associated switches. Management controllers at various levels in a control plane hierarchy and associated with switches in the physical hierarchy are configured to add their IP addresses to DHCP (Dynamic Host Control Protocol) responses that are generated by a DCHP server in response to DCHP requests for IP address requests initiated by DHCP clients including manageability controllers, compute nodes and storage nodes in the data center. As the DCHP response traverses each of multiple switches along a forwarding path from the DCHP server to the DHCP client, an IP address of the manageability controller associated with the switch is inserted. Upon receipt at the DHCP client, the inserted IP addresses are extracted and used to automate learning of the control plane hierarchy.
Abstract:
A computing system can include a platform firmware to monitor hardware errors and to notify an operating system when a corrective action is to be performed to address a hardware error. The computing system can also include an extended error log to describe a hardware error. The computing system can further include an action record to direct the operating system to perform the corrective action to address the hardware error.
Abstract:
Methods and apparatus relating to data streaming accelerators are described. In an embodiment, a hardware accelerator such as a Data Streaming Accelerator (DSA) logic circuitry performs data movement and/or data transformation for data to be transferred between a processor (having one or more processor cores) and a storage device. Other embodiments are also disclosed and claimed.
Abstract:
A host fabric interface (HFI) apparatus, including: an HFI to communicatively couple to a fabric; and a remote hardware acceleration (RHA) engine to: query an orchestrator via the fabric to identify a remote resource having an accelerator; and send a remote accelerator request to the remote resource via the fabric.
Abstract:
In one embodiment, a block data transfer interface employing offload data transfer engine in accordance with the present description includes an offload data transfer engine executing a data transfer command set to transfer a block of data in a transfer data path from a source memory to a new region of a destination memory, wherein the transfer data path bypasses a central processing unit to minimize or reduce involvement of the central processing unit in the block transfer. In response to a successful transfer indication, a logical address is re-mapped to a physical address of the new region of the destination memory, instead of a physical address of the original region of the destination memory. In one embodiment, the re-mapping is performed by a central processing unit. In another embodiment, the re-mapping is performed by the offload data transfer engine. Other aspects are described herein.