ACCELERATOR CONTROLLER HUB
    1.
    发明申请

    公开(公告)号:US20210042254A1

    公开(公告)日:2021-02-11

    申请号:US17083200

    申请日:2020-10-28

    IPC分类号: G06F13/40 G06F13/42

    摘要: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.

    OVERLAPPED RENDEZVOUS MEMORY REGISTRATION
    2.
    发明申请

    公开(公告)号:US20190102236A1

    公开(公告)日:2019-04-04

    申请号:US15721854

    申请日:2017-09-30

    IPC分类号: G06F9/54 G06F9/30

    摘要: Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. Subsequent to or in conjunction with sending the PUT request message, the send buffer is exposed on the sender. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. The RMA GET operation may be retried one or more times in the event that the send buffer has yet to be exposed. If the PUT request message is expected, the data payload with the PUT request is written to a receive buffer on the receiver determined using the first match indicia. The techniques included implementations using the Portals APIs and Message Passing Interface (MPI) applications and provide an improved rendezvous protocol.

    LOW LATENCY CLUSTER COMPUTING
    3.
    发明申请
    LOW LATENCY CLUSTER COMPUTING 有权
    低功能集群计算

    公开(公告)号:US20140129635A1

    公开(公告)日:2014-05-08

    申请号:US13994478

    申请日:2011-12-30

    IPC分类号: H04L29/08

    摘要: An embodiment includes a low-latency mechanism for performing a checkpoint on a distributed application. More specifically, an embodiment of the invention includes processing a first application on a compute node, which is included in a cluster, to produce first computed data and then storing the first computed data in volatile memory included locally in the compute node; halting the processing of the first application, based on an initiated checkpoint, and storing first state data corresponding to the halted first application in the volatile memory; storing the first state information and the first computed data in non-volatile memory included locally in the compute node; and resuming processing of the halted first application and then continuing the processing the first application to produce second computed data while simultaneously pulling the first state information and the first computed data from the non-volatile memory to an input/output (IO) node.

    摘要翻译: 一个实施例包括用于在分布式应用上执行检查点的低延迟机制。 更具体地,本发明的实施例包括处理包括在群集中的计算节点上的第一应用以产生第一计算数据,然后将第一计算数据存储在本地包括在计算节点中的易失性存储器中; 基于发起的检查点停止第一应用的处理,并将对应于停止的第一应用的第一状态数据存储在易失性存储器中; 将第一状态信息和第一计算数据存储在本地包括在计算节点中的非易失性存储器中; 以及恢复停止的第一应用的处理,然后继续处理第一应用以产生第二计算数据,同时将第一状态信息和第一计算数据从非易失性存储器提取到输入/输出(IO)节点。

    Low latency cluster computing
    5.
    发明授权
    Low latency cluster computing 有权
    低延迟集群计算

    公开(公告)号:US09560117B2

    公开(公告)日:2017-01-31

    申请号:US13994478

    申请日:2011-12-30

    摘要: An embodiment includes a low-latency mechanism for performing a checkpoint on a distributed application. More specifically, an embodiment of the invention includes processing a first application on a compute node, which is included in a cluster, to produce first computed data and then storing the first computed data in volatile memory included locally in the compute node; halting the processing of the first application, based on an initiated checkpoint, and storing first state data corresponding to the halted first application in the volatile memory; storing the first state information and the first computed data in non-volatile memory included locally in the compute node; and resuming processing of the halted first application and then continuing the processing the first application to produce second computed data while simultaneously pulling the first state information and the first computed data from the non-volatile memory to an input/output (IO) node.

    摘要翻译: 一个实施例包括用于在分布式应用上执行检查点的低延迟机制。 更具体地,本发明的实施例包括处理包括在群集中的计算节点上的第一应用以产生第一计算数据,然后将第一计算数据存储在本地包括在计算节点中的易失性存储器中; 基于发起的检查点停止第一应用的处理,并将对应于停止的第一应用的第一状态数据存储在易失性存储器中; 将第一状态信息和第一计算数据存储在本地包括在计算节点中的非易失性存储器中; 以及恢复停止的第一应用的处理,然后继续处理第一应用以产生第二计算数据,同时将第一状态信息和第一计算数据从非易失性存储器提取到输入/输出(IO)节点。