Messaging In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
    1.
    发明申请
    Messaging In A Parallel Computer Using Remote Direct Memory Access ('RDMA') 失效
    使用远程直接内存访问(“RDMA”)的并行计算机中的消息传递

    公开(公告)号:US20120331065A1

    公开(公告)日:2012-12-27

    申请号:US13167911

    申请日:2011-06-24

    IPC分类号: G06F15/16

    CPC分类号: G06F15/167 G06F15/17331

    摘要: Messaging in a parallel computer using remote direct memory access (‘RDMA’), including: receiving a send work request; responsive to the send work request: translating a local virtual address on the first node from which data is to be transferred to a physical address on the first node from which data is to be transferred from; creating a local RDMA object that includes a counter set to the size of a messaging acknowledgment field; sending, from a messaging unit in the first node to a messaging unit in a second node, a message that includes a RDMA read operation request, the physical address of the local RDMA object, and the physical address on the first node from which data is to be transferred from; and receiving, by the first node responsive to the second node's execution of the RDMA read operation request, acknowledgment data in the local RDMA object.

    摘要翻译: 使用远程直接内存访问(RDMA)在并行计算机中进行消息传递,包括:接收发送工作请求; 响应于所述发送工作请求:将要从其传送数据的第一节点上的本地虚拟地址转换为要从其传送数据的第一节点上的物理地址; 创建本地RDMA对象,其包括设置为消息收发确认字段的大小的计数器; 从第一节点中的消息单元向第二节点中的消息单元发送包括RDMA读操作请求,本地RDMA对象的物理地址以及第一节点上的物理地址的消息,数据为 被转移 以及响应于所述第二节点执行所述RDMA读取操作请求的所述第一节点接收所述本地RDMA对象中的确认数据。

    Messaging in a parallel computer using remote direct memory access (‘RDMA’)

    公开(公告)号:US08490113B2

    公开(公告)日:2013-07-16

    申请号:US13167911

    申请日:2011-06-24

    IPC分类号: G06F13/00

    CPC分类号: G06F15/167 G06F15/17331

    摘要: Messaging in a parallel computer using remote direct memory access (‘RDMA’), including: receiving a send work request; responsive to the send work request: translating a local virtual address on the first node from which data is to be transferred to a physical address on the first node from which data is to be transferred from; creating a local RDMA object that includes a counter set to the size of a messaging acknowledgment field; sending, from a messaging unit in the first node to a messaging unit in a second node, a message that includes a RDMA read operation request, the physical address of the local RDMA object, and the physical address on the first node from which data is to be transferred from; and receiving, by the first node responsive to the second node's execution of the RDMA read operation request, acknowledgment data in the local RDMA object.

    Remote Direct Memory Access ('RDMA') In A Parallel Computer
    3.
    发明申请
    Remote Direct Memory Access ('RDMA') In A Parallel Computer 审中-公开
    并行计算机中的远程直接存储器访问('RDMA')

    公开(公告)号:US20120331243A1

    公开(公告)日:2012-12-27

    申请号:US13167950

    申请日:2011-06-24

    IPC分类号: G06F12/00

    摘要: Remote direct memory access (‘RDMA’) in a parallel computer, the parallel computer including a plurality of nodes, each node including a messaging unit, including: receiving an RDMA read operation request that includes a virtual address representing a memory region at which to receive data to be transferred from a second node to the first node; responsive to the RDMA read operation request: translating the virtual address to a physical address; creating a local RDMA object that includes a counter set to the size of the memory region; sending a message that includes an DMA write operation request, the physical address of the memory region on the first node, the physical address of the local RDMA object on the first node, and a remote virtual address on the second node; and receiving the data to be transferred from the second node.

    摘要翻译: 并行计算机中的远程直接存储器访问(RDMA),所述并行计算机包括多个节点,每个节点包括消息传送单元,包括:接收RDMA读取操作请求,其包括虚拟地址,所述虚拟地址表示用于接收数据的存储器区域 从第二节点传送到第一节点; 响应于RDMA读取操作请求:将虚拟地址转换为物理地址; 创建本地RDMA对象,其包括设置为存储器区域的大小的计数器; 发送包括DMA写入操作请求的消息,第一节点上的存储器区域的物理地址,第一节点上的本地RDMA对象的物理地址以及第二节点上的远程虚拟地址; 并从第二节点接收要传送的数据。

    Calculating A Checksum With Inactive Networking Components In A Computing System
    4.
    发明申请
    Calculating A Checksum With Inactive Networking Components In A Computing System 有权
    在计算系统中使用非活动网络组件计算校验和

    公开(公告)号:US20130212253A1

    公开(公告)日:2013-08-15

    申请号:US13370059

    申请日:2012-02-09

    IPC分类号: G06F15/173

    CPC分类号: H04L43/04 H04L1/00 H04L1/0061

    摘要: Calculating a checksum utilizing inactive networking components in a computing system, including: identifying, by a checksum distribution manager, an inactive networking component, wherein the inactive networking component includes a checksum calculation engine for computing a checksum; sending, to the inactive networking component by the checksum distribution manager, metadata describing a block of data to be transmitted by an active networking component; calculating, by the inactive networking component, a checksum for the block of data; transmitting, to the checksum distribution manager from the inactive networking component, the checksum for the block of data; and sending, by the active networking component, a data communications message that includes the block of data and the checksum for the block of data.

    摘要翻译: 使用计算系统中的非活动网络组件来计算校验和,包括:由校验和分发管理器识别非活动网络组件,其中所述非活动网络组件包括用于计算校验和的校验和计算引擎; 由校验和分发管理器向不活动网络组件发送描述要由主动网络组件发送的数据块的元数据; 由非活动网络组件计算数据块的校验和; 从非活动网络组件向校验和分发管理器发送数据块的校验和; 以及由所述主动网络组件发送包括所述数据块和所述数据块的校验和的数据通信消息。

    Establishing A Data Communications Connection Between A Lightweight Kernel In A Compute Node Of A Parallel Computer And An Input-Output ('I/O') Node Of The Parallel Computer
    5.
    发明申请
    Establishing A Data Communications Connection Between A Lightweight Kernel In A Compute Node Of A Parallel Computer And An Input-Output ('I/O') Node Of The Parallel Computer 审中-公开
    在并行计算机的计算节点和并行计算机的输入输出('I / O')节点之间建立轻量级内核之间的数据通信连接

    公开(公告)号:US20120331153A1

    公开(公告)日:2012-12-27

    申请号:US13166536

    申请日:2011-06-22

    IPC分类号: G06F15/16

    CPC分类号: G06F15/80 G06F15/17356

    摘要: Establishing a data communications connection between a lightweight kernel in a compute node of a parallel computer and an input-output (‘I/O’) node of the parallel computer, including: configuring the compute node with the network address and port value for data communications with the I/O node; establishing a queue pair on the compute node, the queue pair identified by a queue pair number (‘QPN’); receiving, in the I/O node on the parallel computer from the lightweight kernel, a connection request message; establishing by the I/O node on the I/O node a queue pair identified by a QPN for communications with the compute node; and establishing by the I/O node the requested connection by sending to the lightweight kernel a connection reply message.

    摘要翻译: 在并行计算机的计算节点中的轻量级内核与并行计算机的输入输出(I / O)节点之间建立数据通信连接,其中包括:使用网络地址和端口值配置计算节点以进行数据通信 I / O节点; 在计算节点上建立队列对,由队列对(QPN)标识的队列对; 在轻巧内核的并行计算机上的I / O节点中接收连接请求消息; 由所述I / O节点上的所述I / O节点建立由QPN标识的用于与所述计算节点进行通信的队列对; 以及通过向轻量级内核发送连接回复消息,由I / O节点建立所请求的连接。

    Calculating a checksum with inactive networking components in a computing system
    6.
    发明授权
    Calculating a checksum with inactive networking components in a computing system 有权
    使用计算系统中的非活动网络组件计算校验和

    公开(公告)号:US08914498B2

    公开(公告)日:2014-12-16

    申请号:US13370059

    申请日:2012-02-09

    IPC分类号: G06F15/173

    CPC分类号: H04L43/04 H04L1/00 H04L1/0061

    摘要: Calculating a checksum utilizing inactive networking components in a computing system, including: identifying, by a checksum distribution manager, an inactive networking component, wherein the inactive networking component includes a checksum calculation engine for computing a checksum; sending, to the inactive networking component by the checksum distribution manager, metadata describing a block of data to be transmitted by an active networking component; calculating, by the inactive networking component, a checksum for the block of data; transmitting, to the checksum distribution manager from the inactive networking component, the checksum for the block of data; and sending, by the active networking component, a data communications message that includes the block of data and the checksum for the block of data.

    摘要翻译: 使用计算系统中的非活动网络组件来计算校验和,包括:由校验和分发管理器识别非活动网络组件,其中所述非活动网络组件包括用于计算校验和的校验和计算引擎; 由校验和分发管理器向不活动网络组件发送描述要由主动网络组件发送的数据块的元数据; 由非活动网络组件计算数据块的校验和; 从非活动网络组件向校验和分发管理器发送数据块的校验和; 以及由所述主动网络组件发送包括所述数据块和所述数据块的校验和的数据通信消息。

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
    7.
    发明授权
    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application 有权
    聚合执行并行应用的多个计算节点的作业退出状态

    公开(公告)号:US09086962B2

    公开(公告)日:2015-07-21

    申请号:US13524602

    申请日:2012-06-15

    IPC分类号: G06F11/07 G06F9/52 G06F11/30

    摘要: Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

    摘要翻译: 聚合执行并行应用的多个计算节点的作业退出状态,包括:识别并行计算机中的计算节点的子集以执行并行应用; 在并行计算机中的计算节点的子集中选择一个计算节点作为工作领导计算节点; 启动计算节点子集上的并行应用程序的执行; 从所述计算节点的子集中的每个计算节点接收退出状态,其中每个计算节点的退出状态包括描述由所述计算节点执行所述并行应用的一部分的信息; 从计算节点的子集中的每个计算节点聚合每个退出状态; 并且为并行计算机中的计算节点的子集发送聚合退出状态。

    Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
    8.
    发明申请
    Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA') 审中-公开
    使用远程直接内存访问(“RDMA”)配置并行计算机节点

    公开(公告)号:US20130185381A1

    公开(公告)日:2013-07-18

    申请号:US13351419

    申请日:2012-01-17

    IPC分类号: G06F15/16

    摘要: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.

    摘要翻译: 使用远程直接存储器访问(“RDMA”)来配置并行计算机中的计算节点,所述并行计算机包括经由一个或多个数据通信网络耦合用于数据通信的多个计算节点,包括:由源计算节点 并行计算机,RDMA广播操作以将二进制配置信息广播到并行计算机中的一个或多个目标计算节点; 由每个目标计算节点准备用于从源计算节点接收二进制配置信息的目标计算节点; 由所述目标计算节点向所述目标计算节点发送就绪消息,所述就绪消息指示所述目标计算节点准备好从所述源计算节点接收所述二进制配置信息; 并且由源计算节点执行RDMA广播操作以将二进制配置信息写入每个目标计算节点的存储器中。

    Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’)

    公开(公告)号:US10474625B2

    公开(公告)日:2019-11-12

    申请号:US13351419

    申请日:2012-01-17

    摘要: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.

    Aggregating Job Exit Statuses Of A Plurality Of Compute Nodes Executing A Parallel Application
    10.
    发明申请
    Aggregating Job Exit Statuses Of A Plurality Of Compute Nodes Executing A Parallel Application 有权
    多个计算节点执行并行应用程序的聚合作业退出状态

    公开(公告)号:US20130339805A1

    公开(公告)日:2013-12-19

    申请号:US13524602

    申请日:2012-06-15

    IPC分类号: G06F9/46 G06F11/07

    摘要: Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

    摘要翻译: 聚合执行并行应用的多个计算节点的作业退出状态,包括:识别并行计算机中的计算节点的子集以执行并行应用; 在并行计算机中的计算节点的子集中选择一个计算节点作为工作领导计算节点; 启动计算节点子集上的并行应用程序的执行; 从所述计算节点的子集中的每个计算节点接收退出状态,其中每个计算节点的退出状态包括描述由所述计算节点执行所述并行应用的某些部分的信息; 从计算节点的子集中的每个计算节点聚合每个退出状态; 并且为并行计算机中的计算节点的子集发送聚合退出状态。