Re-executing launcher program upon termination of launched programs in MIMD mode booted SIMD partitions
    1.
    发明授权
    Re-executing launcher program upon termination of launched programs in MIMD mode booted SIMD partitions 失效
    在MIMD模式启动程序终止后重新执行启动程序启动SIMD分区

    公开(公告)号:US07979674B2

    公开(公告)日:2011-07-12

    申请号:US11749397

    申请日:2007-05-16

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5061

    摘要: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.

    摘要翻译: 在SIMD机器上执行MIMD程序,所述SIMD机器包括多个计算节点,每个计算节点仅能够执行单个执行线程,所述计算节点最初被配置为专用于SIMD操作,所述SIMD机器还包括数据通信网络 所述网络包括所述计算节点之间的同步数据通信链路,包括建立一个或多个SIMD分区,以MIMD模式引导一个或多个SIMD分区; 建立MIMD分区; 通过发射器程序在MIMD分区的两个或更多个计算节点上执行多个MIMD程序; 以及当由所述启动程序执行的MIMD程序终止时,由MIMD分区中的计算节点上的操作系统重新执行启动程序。

    Executing Multiple Instructions Multiple Data (‘MIMD’) programs on a Single Instruction Multiple Data (‘SIMD’) machine
    2.
    发明授权
    Executing Multiple Instructions Multiple Data (‘MIMD’) programs on a Single Instruction Multiple Data (‘SIMD’) machine 失效
    在单指令多数据(“SIMD”)机器上执行多指令多数据('MIMD')程序

    公开(公告)号:US07831802B2

    公开(公告)日:2010-11-09

    申请号:US11780072

    申请日:2007-07-19

    IPC分类号: G06F15/76

    CPC分类号: G06F15/161

    摘要: Executing Multiple Instructions Multiple Data (‘MIMD’) programs on a Single Instruction Multiple Data (‘SIMD’) machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing a SIMD partition comprising a plurality of the compute nodes; booting the SIMD partition in MIMD mode; executing by launcher programs a plurality of MIMD programs on compute nodes in the SIMD partition; and re-executing a launcher program by an operating system on a compute node in the SIMD partition upon termination of the MIMD program executed by the launcher program.

    摘要翻译: 在单指令多数据(“SIMD”)机器上执行多指令多数据(“MIMD”)程序,SIMD机器包括多个计算节点,每个计算节点只能执行单个执行线程,计算节点 最初被配置为专用于SIMD操作,所述SIMD机器还包括数据通信网络,所述网络包括所述计算节点之间的同步数据通信链路,包括建立包括多个所述计算节点的SIMD分区; 以MIMD模式引导SIMD分区; 通过启动程序执行SIMD分区中的计算节点上的多个MIMD程序; 以及在由所述启动程序执行的所述MIMD程序终止时,由所述SIMD分区中的计算节点上的操作系统重新执行启动程序。

    Executing Multiple Instructions Multiple Data ('MIMD') Programs on a Single Instruction Multiple Data ('SIMD') Machine
    3.
    发明申请
    Executing Multiple Instructions Multiple Data ('MIMD') Programs on a Single Instruction Multiple Data ('SIMD') Machine 失效
    在单指令多数据(“SIMD”)机器上执行多指令多数据('MIMD')程序

    公开(公告)号:US20090024830A1

    公开(公告)日:2009-01-22

    申请号:US11780072

    申请日:2007-07-19

    IPC分类号: G06F15/00

    CPC分类号: G06F15/161

    摘要: Executing Multiple Instructions Multiple Data (‘MIMD’) programs on a Single Instruction Multiple Data (‘SIMD’) machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing a SIMD partition comprising a plurality of the compute nodes; booting the SIMD partition in MIMD mode; executing by launcher programs a plurality of MIMD programs on compute nodes in the SIMD partition; and re-executing a launcher program by an operating system on a compute node in the SIMD partition upon termination of the MIMD program executed by the launcher program.

    摘要翻译: 在单指令多数据(“SIMD”)机器上执行多指令多数据(“MIMD”)程序,SIMD机器包括多个计算节点,每个计算节点只能执行单个执行线程,计算节点 最初被配置为专用于SIMD操作,所述SIMD机器还包括数据通信网络,所述网络包括所述计算节点之间的同步数据通信链路,包括建立包括多个所述计算节点的SIMD分区; 以MIMD模式引导SIMD分区; 通过启动程序执行SIMD分区中的计算节点上的多个MIMD程序; 以及在由所述启动程序执行的所述MIMD程序终止时,由所述SIMD分区中的计算节点上的操作系统重新执行启动程序。

    Messaging In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
    6.
    发明申请
    Messaging In A Parallel Computer Using Remote Direct Memory Access ('RDMA') 失效
    使用远程直接内存访问(“RDMA”)的并行计算机中的消息传递

    公开(公告)号:US20120331065A1

    公开(公告)日:2012-12-27

    申请号:US13167911

    申请日:2011-06-24

    IPC分类号: G06F15/16

    CPC分类号: G06F15/167 G06F15/17331

    摘要: Messaging in a parallel computer using remote direct memory access (‘RDMA’), including: receiving a send work request; responsive to the send work request: translating a local virtual address on the first node from which data is to be transferred to a physical address on the first node from which data is to be transferred from; creating a local RDMA object that includes a counter set to the size of a messaging acknowledgment field; sending, from a messaging unit in the first node to a messaging unit in a second node, a message that includes a RDMA read operation request, the physical address of the local RDMA object, and the physical address on the first node from which data is to be transferred from; and receiving, by the first node responsive to the second node's execution of the RDMA read operation request, acknowledgment data in the local RDMA object.

    摘要翻译: 使用远程直接内存访问(RDMA)在并行计算机中进行消息传递,包括:接收发送工作请求; 响应于所述发送工作请求:将要从其传送数据的第一节点上的本地虚拟地址转换为要从其传送数据的第一节点上的物理地址; 创建本地RDMA对象,其包括设置为消息收发确认字段的大小的计数器; 从第一节点中的消息单元向第二节点中的消息单元发送包括RDMA读操作请求,本地RDMA对象的物理地址以及第一节点上的物理地址的消息,数据为 被转移 以及响应于所述第二节点执行所述RDMA读取操作请求的所述第一节点接收所述本地RDMA对象中的确认数据。

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
    7.
    发明授权
    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application 有权
    聚合执行并行应用的多个计算节点的作业退出状态

    公开(公告)号:US09086962B2

    公开(公告)日:2015-07-21

    申请号:US13524602

    申请日:2012-06-15

    IPC分类号: G06F11/07 G06F9/52 G06F11/30

    摘要: Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

    摘要翻译: 聚合执行并行应用的多个计算节点的作业退出状态,包括:识别并行计算机中的计算节点的子集以执行并行应用; 在并行计算机中的计算节点的子集中选择一个计算节点作为工作领导计算节点; 启动计算节点子集上的并行应用程序的执行; 从所述计算节点的子集中的每个计算节点接收退出状态,其中每个计算节点的退出状态包括描述由所述计算节点执行所述并行应用的一部分的信息; 从计算节点的子集中的每个计算节点聚合每个退出状态; 并且为并行计算机中的计算节点的子集发送聚合退出状态。

    Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
    8.
    发明申请
    Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA') 审中-公开
    使用远程直接内存访问(“RDMA”)配置并行计算机节点

    公开(公告)号:US20130185381A1

    公开(公告)日:2013-07-18

    申请号:US13351419

    申请日:2012-01-17

    IPC分类号: G06F15/16

    摘要: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.

    摘要翻译: 使用远程直接存储器访问(“RDMA”)来配置并行计算机中的计算节点,所述并行计算机包括经由一个或多个数据通信网络耦合用于数据通信的多个计算节点,包括:由源计算节点 并行计算机,RDMA广播操作以将二进制配置信息广播到并行计算机中的一个或多个目标计算节点; 由每个目标计算节点准备用于从源计算节点接收二进制配置信息的目标计算节点; 由所述目标计算节点向所述目标计算节点发送就绪消息,所述就绪消息指示所述目标计算节点准备好从所述源计算节点接收所述二进制配置信息; 并且由源计算节点执行RDMA广播操作以将二进制配置信息写入每个目标计算节点的存储器中。

    Collectively loading an application in a parallel computer
    9.
    发明授权
    Collectively loading an application in a parallel computer 有权
    在并行计算机中集体加载应用程序

    公开(公告)号:US09229782B2

    公开(公告)日:2016-01-05

    申请号:US13431248

    申请日:2012-03-27

    IPC分类号: G06F9/46 G06F9/50

    CPC分类号: G06F9/5072 G06F2209/549

    摘要: Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

    摘要翻译: 在并行计算机中集体加载应用程序,并行计算机包括多个计算节点,包括:通过并行计算机控制系统识别并行计算机中的计算节点的子集以执行作业; 由并行计算机控制系统选择并行计算机中的计算节点子集之一作为工作领导计算节点; 由作业领导计算节点从计算机存储器检索用于执行作业的应用程序; 并且由作业领导者将并行计算机中的计算节点的子集广播为执行作业的应用程序。

    Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’)

    公开(公告)号:US10474625B2

    公开(公告)日:2019-11-12

    申请号:US13351419

    申请日:2012-01-17

    摘要: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.