专利检索 ap:"SAYANTAN SUR" 第 1 页

1.

发明申请
ACCELERATOR CONTROLLER HUB 有权

公开(公告)号：US20210042254A1

公开(公告)日：2021-02-11

申请号：US17083200

申请日：2020-10-28

申请人： Pratik Marolia , Andrew Herdrich , Rajesh Sankaran , Rahul Pal , David Puffer , Sayantan Sur , Ajaya Durg

发明人： Pratik Marolia , Andrew Herdrich , Rajesh Sankaran , Rahul Pal , David Puffer , Sayantan Sur , Ajaya Durg

IPC分类号： G06F13/40 , G06F13/42

摘要： Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.

2.

发明申请
OVERLAPPED RENDEZVOUS MEMORY REGISTRATION 审中-公开

公开(公告)号：US20190102236A1

公开(公告)日：2019-04-04

申请号：US15721854

申请日：2017-09-30

申请人： Sayantan Sur , Keith Underwood , Ravindra Babu Ganapathi , Andrew Friedley

发明人： Sayantan Sur , Keith Underwood , Ravindra Babu Ganapathi , Andrew Friedley

IPC分类号： G06F9/54 , G06F9/30

摘要： Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. Subsequent to or in conjunction with sending the PUT request message, the send buffer is exposed on the sender. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. The RMA GET operation may be retried one or more times in the event that the send buffer has yet to be exposed. If the PUT request message is expected, the data payload with the PUT request is written to a receive buffer on the receiver determined using the first match indicia. The techniques included implementations using the Portals APIs and Message Passing Interface (MPI) applications and provide an improved rendezvous protocol.

3.

发明申请
LOW LATENCY CLUSTER COMPUTING 有权
标题翻译：低功能集群计算

公开(公告)号：US20140129635A1

公开(公告)日：2014-05-08

申请号：US13994478

申请日：2011-12-30

申请人： Mark S. Hefty , Arlin Davis , Robert Woodruff , Sayantan Sur , Shiow-wen Cheng

发明人： Mark S. Hefty , Arlin Davis , Robert Woodruff , Sayantan Sur , Shiow-wen Cheng

IPC分类号： H04L29/08

CPC分类号： H04L67/10 , G06F9/06 , G06F11/00 , G06F11/1407 , G06F11/1438 , G06F11/1464 , G06F11/1466 , G06F11/1471 , G06F13/14

摘要： An embodiment includes a low-latency mechanism for performing a checkpoint on a distributed application. More specifically, an embodiment of the invention includes processing a first application on a compute node, which is included in a cluster, to produce first computed data and then storing the first computed data in volatile memory included locally in the compute node; halting the processing of the first application, based on an initiated checkpoint, and storing first state data corresponding to the halted first application in the volatile memory; storing the first state information and the first computed data in non-volatile memory included locally in the compute node; and resuming processing of the halted first application and then continuing the processing the first application to produce second computed data while simultaneously pulling the first state information and the first computed data from the non-volatile memory to an input/output (IO) node.

摘要翻译： 一个实施例包括用于在分布式应用上执行检查点的低延迟机制。更具体地，本发明的实施例包括处理包括在群集中的计算节点上的第一应用以产生第一计算数据，然后将第一计算数据存储在本地包括在计算节点中的易失性存储器中; 基于发起的检查点停止第一应用的处理，并将对应于停止的第一应用的第一状态数据存储在易失性存储器中; 将第一状态信息和第一计算数据存储在本地包括在计算节点中的非易失性存储器中; 以及恢复停止的第一应用的处理，然后继续处理第一应用以产生第二计算数据，同时将第一状态信息和第一计算数据从非易失性存储器提取到输入/输出（IO）节点。

4.

发明申请
AT LEAST ONE MECHANISM TO PERMIT, AT LEAST IN PART, ALLOCATION AND/OR CONFIGURATION, AT LEAST IN PART, OF AT LEAST ONE NETWORK-ASSOCIATED OBJECT 有权
标题翻译：至少一个机制允许，至少部分，分配和/或配置，至少在一个网络相关对象的一部分

公开(公告)号：US20150305006A1

公开(公告)日：2015-10-22

申请号：US14254609

申请日：2014-04-16

申请人： WILLIAM R. MAGRO , TODD M. RIMMER , ROBERT J. WOODRUFF , MARK S. HEFTY , SAYANTAN SUR

发明人： WILLIAM R. MAGRO , TODD M. RIMMER , ROBERT J. WOODRUFF , MARK S. HEFTY , SAYANTAN SUR

IPC分类号： H04W72/04

CPC分类号： H04L49/20 , G06F9/4488 , G06F9/50 , H04L49/10 , H04L61/103 , H04L63/10 , H04W12/06

摘要： In an embodiment, at least one interface mechanism may be provided. The mechanism may permit, at least in part, at least one process allocate, at least in part, and/or configure, at least in part, at least one network-associated object. Such allocation and/or configuration, at least in part, may be in accordance with at least one parameter set that may correspond, at least in part, to at least one query issued by the at least one process via the mechanism. Many modifications are possible without departing from this embodiment.

摘要翻译： 在一个实施例中，可以提供至少一个接口机构。该机制可以至少部分地允许至少一个过程至少部分地分配和/或配置至少一个网络相关对象。至少部分地，这样的分配和/或配置可以与至少一个可以至少部分地对应于由至少一个进程通过机制发出的至少一个查询的至少一个参数集合进行。在不脱离本实施例的情况下，可以进行许多修改。

5.

发明授权
Low latency cluster computing 有权
标题翻译：低延迟集群计算

公开(公告)号：US09560117B2

公开(公告)日：2017-01-31

申请号：US13994478

申请日：2011-12-30

申请人： Mark S. Hefty , Arlin Davis , Robert Woodruff , Sayantan Sur , Shiow-wen Cheng

发明人： Mark S. Hefty , Arlin Davis , Robert Woodruff , Sayantan Sur , Shiow-wen Cheng

IPC分类号： G06F13/28 , H04L29/08 , G06F13/14 , G06F9/06 , G06F11/14

CPC分类号： H04L67/10 , G06F9/06 , G06F11/00 , G06F11/1407 , G06F11/1438 , G06F11/1464 , G06F11/1466 , G06F11/1471 , G06F13/14

摘要： An embodiment includes a low-latency mechanism for performing a checkpoint on a distributed application. More specifically, an embodiment of the invention includes processing a first application on a compute node, which is included in a cluster, to produce first computed data and then storing the first computed data in volatile memory included locally in the compute node; halting the processing of the first application, based on an initiated checkpoint, and storing first state data corresponding to the halted first application in the volatile memory; storing the first state information and the first computed data in non-volatile memory included locally in the compute node; and resuming processing of the halted first application and then continuing the processing the first application to produce second computed data while simultaneously pulling the first state information and the first computed data from the non-volatile memory to an input/output (IO) node.

摘要翻译： 一个实施例包括用于在分布式应用上执行检查点的低延迟机制。更具体地，本发明的实施例包括处理包括在群集中的计算节点上的第一应用以产生第一计算数据，然后将第一计算数据存储在本地包括在计算节点中的易失性存储器中; 基于发起的检查点停止第一应用的处理，并将对应于停止的第一应用的第一状态数据存储在易失性存储器中; 将第一状态信息和第一计算数据存储在本地包括在计算节点中的非易失性存储器中; 以及恢复停止的第一应用的处理，然后继续处理第一应用以产生第二计算数据，同时将第一状态信息和第一计算数据从非易失性存储器提取到输入/输出（IO）节点。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类