Collectively Loading An Application In A Parallel Computer
    2.
    发明申请
    Collectively Loading An Application In A Parallel Computer 有权
    在并行计算机中集成加载应用程序

    公开(公告)号:US20130263138A1

    公开(公告)日:2013-10-03

    申请号:US13431248

    申请日:2012-03-27

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5072 G06F2209/549

    摘要: Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

    摘要翻译: 在并行计算机中集体加载应用程序,并行计算机包括多个计算节点,包括:通过并行计算机控制系统识别并行计算机中的计算节点的子集以执行作业; 由并行计算机控制系统选择并行计算机中的计算节点子集之一作为工作领导计算节点; 由作业领导计算节点从计算机存储器检索用于执行作业的应用程序; 并且由作业领导者将并行计算机中的计算节点的子集广播为执行作业的应用程序。

    Resource management on a computer system utilizing hardware and environmental factors
    3.
    发明授权
    Resource management on a computer system utilizing hardware and environmental factors 失效
    利用硬件和环境因素对计算机系统进行资源管理

    公开(公告)号:US08225324B2

    公开(公告)日:2012-07-17

    申请号:US12121096

    申请日:2008-05-15

    IPC分类号: G06F9/46

    摘要: A method for resource management on a computer system utilizing hardware and environmental information. A caller interacts with an application program interface to handle information requests with a persistent data storage device to combine information involving hardware resource information, environmental data and other system information, all both historical, present and predicted values. Application execution decisions may then made regarding hardware for the calling entity. The method may be implemented as a computer process.

    摘要翻译: 一种利用硬件和环境信息的计算机系统上的资源管理方法。 呼叫者与应用程序接口交互以利用持久性数据存储设备处理信息请求,以组合涉及硬件资源信息,环境数据和其他系统信息的信息,所有这些信息都包括历史,现在和预测值。 然后可以对呼叫实体的硬件进行应用执行决定。 该方法可以被实现为计算机进程。

    Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job
    4.
    发明授权
    Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job 有权
    将连接的节点动态地重新分配到一个计算节点块,以重新启动失败的作业

    公开(公告)号:US08140889B2

    公开(公告)日:2012-03-20

    申请号:US12861426

    申请日:2010-08-23

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2035 G06F11/203

    摘要: Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.

    摘要翻译: 方法,系统和产品用于动态重新分配连接的节点到计算节点块以重新启动失败的作业,包括:识别作业在计算节点块上执行失败,因为在分配为 至少一个用于计算节点块的连接节点及其支持的I / O节点; 并且重新启动该作业,包括选择主动耦合以与活动I / O节点进行数据通信的备选连接节点; 并且将替代连接的节点分配为用于运行重新启动的作业的计算节点的块的连接节点。

    Resource Management on a Computer System Utilizing Hardware and Environmental Factors
    5.
    发明申请
    Resource Management on a Computer System Utilizing Hardware and Environmental Factors 失效
    计算机系统资源管理利用硬件和环境因素

    公开(公告)号:US20090288094A1

    公开(公告)日:2009-11-19

    申请号:US12121096

    申请日:2008-05-15

    IPC分类号: G06F9/46

    摘要: A method for resource management on a computer system utilizing hardware and environmental information. A caller interacts with an application program interface to handle information requests with a persistent data storage device to combine information involving hardware resource information, environmental data and other system information, all both historical, present and predicted values. Application execution decisions may then made regarding hardware for the calling entity. The method may be implemented as a computer process.

    摘要翻译: 一种利用硬件和环境信息的计算机系统上的资源管理方法。 呼叫者与应用程序接口交互以利用持久性数据存储设备处理信息请求,以组合涉及硬件资源信息,环境数据和其他系统信息的信息,所有这些信息都包括历史,现在和预测值。 然后可以对呼叫实体的硬件进行应用执行决定。 该方法可以被实现为计算机进程。

    Scaling and Managing Work Requests on a Massively Parallel Machine
    6.
    发明申请
    Scaling and Managing Work Requests on a Massively Parallel Machine 有权
    在大型并行机上扩展和管理工作请求

    公开(公告)号:US20090288085A1

    公开(公告)日:2009-11-19

    申请号:US12121262

    申请日:2008-05-15

    IPC分类号: G06F9/46

    摘要: A method, computer program product and computer system for scaling and managing requests on a massively parallel machine, such as one running in MIMD mode on a SIMD machine. A submit mux (multiplexer) is used to federate work requests and to forward the requests to the management node. A resource arbiter receives and manges these work requests. A MIMD job controller works with the resource arbiter to manage the work requests on the SIMD partition. The SIMD partition may utilize a mux of its own to federate the work requests and the computer nodes. Instructions are also provided to control and monitor the work requests.

    摘要翻译: 一种用于在大型并行机器上缩放和管理请求的方法,计算机程序产品和计算机系统,例如在SIMD机器上以MIMD模式运行的请求。 提交多路复用器(Multiplexux)用于联合工作请求并将请求转发到管理节点。 资源仲裁者接收并管理这些工作请求。 MIMD作业控制器与资源仲裁器配合使用以管理SIMD分区上的工作请求。 SIMD分区可以利用其自己的多路复用器来联合工作请求和计算机节点。 还提供说明以控制和监视工作请求。

    Scaling and managing work requests on a massively parallel machine
    8.
    发明授权
    Scaling and managing work requests on a massively parallel machine 有权
    在大型并行机上扩展和管理工作请求

    公开(公告)号:US08918624B2

    公开(公告)日:2014-12-23

    申请号:US12121262

    申请日:2008-05-15

    IPC分类号: G06F9/30 G06F9/48 G06F9/38

    摘要: A method, computer program product and computer system for scaling and managing requests on a massively parallel machine, such as one running in MIMD mode on a SIMD machine. A submit mux (multiplexer) is used to federate work requests and to forward the requests to the management node. A resource arbiter receives and manges these work requests. A MIMD job controller works with the resource arbiter to manage the work requests on the SIMD partition. The SIMD partition may utilize a mux of its own to federate the work requests and the computer nodes. Instructions are also provided to control and monitor the work requests.

    摘要翻译: 一种用于在大型并行机器上缩放和管理请求的方法,计算机程序产品和计算机系统,例如在SIMD机器上以MIMD模式运行的请求。 提交多路复用器(Multiplexux)用于联合工作请求并将请求转发到管理节点。 资源仲裁器接收并管理这些工作请求。 MIMD作业控制器与资源仲裁器配合使用以管理SIMD分区上的工作请求。 SIMD分区可以利用其自己的多路复用器来联合工作请求和计算机节点。 还提供说明以控制和监视工作请求。

    DYNAMICALLY REASSIGNING A CONNECTED NODE TO A BLOCK OF COMPUTE NODES FOR RE-LAUNCHING A FAILED JOB
    9.
    发明申请
    DYNAMICALLY REASSIGNING A CONNECTED NODE TO A BLOCK OF COMPUTE NODES FOR RE-LAUNCHING A FAILED JOB 有权
    将连接的节点动态地重新连接到重新启动失败作业的电脑节目块

    公开(公告)号:US20120047393A1

    公开(公告)日:2012-02-23

    申请号:US12861426

    申请日:2010-08-23

    IPC分类号: G06F11/20

    CPC分类号: G06F11/2035 G06F11/203

    摘要: Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.

    摘要翻译: 方法,系统和产品用于动态重新分配连接的节点到计算节点块以重新启动失败的作业,包括:识别作业在计算节点块上执行失败,因为在分配为 至少一个用于计算节点块的连接节点及其支持的I / O节点; 并且重新启动该作业,包括选择主动耦合以与活动I / O节点进行数据通信的备选连接节点; 并且将替代连接的节点分配为用于运行重新启动的作业的计算节点的块的连接节点。

    Collectively loading an application in a parallel computer
    10.
    发明授权
    Collectively loading an application in a parallel computer 有权
    在并行计算机中集体加载应用程序

    公开(公告)号:US09229782B2

    公开(公告)日:2016-01-05

    申请号:US13431248

    申请日:2012-03-27

    IPC分类号: G06F9/46 G06F9/50

    CPC分类号: G06F9/5072 G06F2209/549

    摘要: Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

    摘要翻译: 在并行计算机中集体加载应用程序,并行计算机包括多个计算节点,包括:通过并行计算机控制系统识别并行计算机中的计算节点的子集以执行作业; 由并行计算机控制系统选择并行计算机中的计算节点子集之一作为工作领导计算节点; 由作业领导计算节点从计算机存储器检索用于执行作业的应用程序; 并且由作业领导者将并行计算机中的计算节点的子集广播为执行作业的应用程序。