Parsing an application to find serial and parallel data segments to minimize mitigation overhead between serial and parallel compute nodes
    1.
    发明授权
    Parsing an application to find serial and parallel data segments to minimize mitigation overhead between serial and parallel compute nodes 有权
    分析应用程序以找到串行和并行数据段,以最大限度地减少串行和并行计算节点之间的缓解开销

    公开(公告)号:US08595736B2

    公开(公告)日:2013-11-26

    申请号:US13440065

    申请日:2012-04-05

    IPC分类号: G06F9/46

    摘要: Methods, systems, and products are disclosed for executing an application on a parallel computer having a plurality of nodes. Executing an application on a parallel computer includes: booting up a first subset of a plurality of nodes in a serial processing mode; booting up a second subset of the plurality of nodes in a parallel processing mode; profiling, prior to application execution, an application to identify serial segments of the application, parallel segments of the application, and application data utilized by each of the serial segments and the parallel segments; and executing the application on the plurality of nodes, including migrating, in dependence upon the profile for the application upon encountering the parallel segments during execution, only specific portions of the application and the application data from the nodes booted up in the serial processing mode to the nodes booted up in the parallel processing mode.

    摘要翻译: 公开了用于在具有多个节点的并行计算机上执行应用的方法,系统和产品。 在并行计算机上执行应用包括:以串行处理模式引导多个节点的第一子集; 以并行处理模式引导所述多个节点的第二子集; 在应用程序执行之前,分析应用程序以识别应用的串行段,应用程序的并行段以及每个串行段和并行段所使用的应用程序数据; 以及在所述多个节点上执行所述应用,包括在执行期间遇到所述并行段时根据所述应用的简档仅迁移所述应用的特定部分和来自所述节点的应用数据,所述节点以串行处理模式被引导到 节点以并行处理模式启动。

    RESOURCE LEAK RECOVERY IN A MULTI-NODE COMPUTER SYSTEM
    4.
    发明申请
    RESOURCE LEAK RECOVERY IN A MULTI-NODE COMPUTER SYSTEM 审中-公开
    多节点计算机系统中的资源泄漏恢复

    公开(公告)号:US20100085871A1

    公开(公告)日:2010-04-08

    申请号:US12244075

    申请日:2008-10-02

    IPC分类号: G06F11/00

    摘要: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

    摘要翻译: 公开了用于识别和恢复并行计算系统的计算节点上的资源泄漏的过程。 资源监视器以清洁状态存储有关计算节点上可用的系统资源的信息。 在计算节点运行作业之后,资源监视器将当前资源可用性与干净状态进行比较。 如果找到资源泄漏,资源监视器将联系全局资源管理器以删除资源泄漏。

    Process Migration Based on Service Availability in a Multi-Node Environment
    5.
    发明申请
    Process Migration Based on Service Availability in a Multi-Node Environment 有权
    基于多节点环境中的服务可用性进行流程迁移

    公开(公告)号:US20090320023A1

    公开(公告)日:2009-12-24

    申请号:US12145219

    申请日:2008-06-24

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5088

    摘要: A process on a highly distributed parallel computing system is disclosed. When a first compute node in a first pool is ready to hand-off a task to second pool for further processing, the first compute node may first determine whether a node is available in the second pool. If no node is available from the second pool, then the first compute node may begin performing a primary task assigned to the second pool of nodes, up to the point where a service available exclusively to the nodes of the second pool is required. In the interim, however, one of the nodes of the second pool may become available. Alternatively, an application program running on a compute node may be configured with an exception handling routine that catches exceptions and migrates the application to a compute node where a necessary service is available, as such exceptions occur.

    摘要翻译: 公开了一种高度分布式并行计算系统的过程。 当第一池中的第一计算节点准备好将任务切换到第二池进行进一步处理时,第一计算节点可以首先确定节点在第二池中是否可用。 如果没有节点可用于第二池,则第一计算节点可以开始执行分配给第二池节点的主任务,直到需要专用于第二池节点的服务。 但是,在此期间,第二个池中的一个节点可能会变得可用。 或者,在计算节点上运行的应用程序可以配置有异常处理例程,捕获异常并将应用迁移到必需服务可用的计算节点,因为这种异常发生。

    Executing A Distributed Java Application On A Plurality Of Compute Nodes
    6.
    发明申请
    Executing A Distributed Java Application On A Plurality Of Compute Nodes 有权
    在多个计算节点上执行分布式Java应用程序

    公开(公告)号:US20090271784A1

    公开(公告)日:2009-10-29

    申请号:US12109238

    申请日:2008-04-24

    IPC分类号: G06F9/46

    摘要: Methods, systems, and products are disclosed for executing a distributed Java application on a plurality of compute nodes. The Java application includes a plurality of jobs distributed among the plurality of compute nodes. The plurality of compute nodes are connected together for data communications through a data communication network. Each of the plurality of compute nodes has installed upon it a Java Virtual Machine (‘JVM’) capable of supporting at least one job of the Java application. Executing a distributed Java application on a plurality of compute nodes includes: tracking, by an application manager, JVM environment variables for the JVMs installed on the plurality of compute nodes; and configuring, by the application manager, the plurality of jobs for execution on the plurality of compute nodes in dependence upon the JVM environment variables for the JVMs installed on the plurality of compute nodes.

    摘要翻译: 公开了用于在多个计算节点上执行分布式Java应用的方法,系统和产品。 Java应用程序包括分布在多个计算节点之间的多个作业。 多个计算节点通过数据通信网络连接在一起用于数据通信。 多个计算节点中的每一个已经在其上安装了能够支持Java应用程序的至少一个作业的Java虚拟机(“JVM”)。 在多个计算节点上执行分布式Java应用包括:由应用管理器跟踪安装在所述多个计算节点上的JVM的JVM环境变量; 以及由应用程序管理器根据安装在多个计算节点上的JVM的JVM环境变量,在多个计算节点上配置用于执行的多个作业。

    Query Optimization in a Parallel Computer System to Reduce Network Traffic
    7.
    发明申请
    Query Optimization in a Parallel Computer System to Reduce Network Traffic 审中-公开
    并行计算机系统中的查询优化,以减少网络流量

    公开(公告)号:US20090043728A1

    公开(公告)日:2009-02-12

    申请号:US11834813

    申请日:2007-08-07

    IPC分类号: G06F17/30

    摘要: An apparatus and method for a database query optimizer to optimize a query that uses multiple networks. The query optimizer optimizes a query to reduce network traffic on a network or node that is overloaded or above an established parameter in a node/network attribute table. The query optimization to reduce network traffic may result in a sub-optimal query in other respects such as execution time. The result is a query optimizer that rewrites or optimizes a query to execute on multiple nodes or networks to reduce traffic on a network or node according to the loading characteristics and assigned attributes of a node or network.

    摘要翻译: 用于优化使用多个网络的查询的数据库查询优化器的装置和方法。 查询优化器优化查询以减少超载或超过节点/网络属性表中已建立参数的网络或节点上的网络流量。 用于减少网络流量的查询优化可能导致其他方面的次优查询,如执行时间。 结果是一个查询优化器,可以根据节点或网络的加载特性和分配的属性重写或优化在多个节点或网络上执行的查询,以减少网络或节点上的流量。

    Scheduling jobs of a multi-node computer system based on environmental impact
    8.
    发明授权
    Scheduling jobs of a multi-node computer system based on environmental impact 有权
    基于环境影响调度多节点计算机系统的作业

    公开(公告)号:US09015726B2

    公开(公告)日:2015-04-21

    申请号:US12418044

    申请日:2009-04-03

    IPC分类号: G06F9/50 G06F9/48

    摘要: Embodiments of the invention provide techniques for scheduling jobs on a multi-node computing system based on the predicted environmental impact of executing the jobs. In one embodiment, a plurality of job plans may be generated for processing a requested job on the multi-node computing system. The environmental impacts resulting from executing each job plan may be estimated by matching the job plans to stored data based on standardized executions of job plans. Further, environmental impacts may be estimated by matching the job plans to stored data based on actual environmental measurements obtained during prior executions of the job plan on the multi-node computer system. The job may be executed using a job plan selected based on predicted environmental impacts and time performance.

    摘要翻译: 本发明的实施例提供了基于执行作业的预测的环境影响来在多节点计算系统上调度作业的技术。 在一个实施例中,可以生成多个作业计划用于在多节点计算系统上处理所请求的作业。 执行每个工作计划所产生的环境影响可以通过将作业计划与基于工作计划的标准化执行的存储数据进行匹配来估计。 此外,可以基于在多节点计算机系统上的作业计划的先前执行期间获得的实际环境测量来将作业计划与存储的数据相匹配来估计环境影响。 可以使用基于预测的环境影响和时间表现选择的工作计划来执行该作业。

    Executing a distributed software application on a plurality of compute nodes according to a compilation history
    9.
    发明授权
    Executing a distributed software application on a plurality of compute nodes according to a compilation history 失效
    根据编译历史在多个计算节点上执行分布式软件应用程序

    公开(公告)号:US08281311B2

    公开(公告)日:2012-10-02

    申请号:US12109248

    申请日:2008-04-24

    IPC分类号: G06F9/46

    摘要: Methods, systems, and products are disclosed for executing a distributed Java application on a plurality of compute nodes. The Java application includes a plurality of jobs distributed among the plurality of compute nodes. The plurality of compute nodes are connected together for data communications through a data communication network. Each of the plurality of compute nodes has installed upon it a Java Virtual Machine (‘JVM’) capable of supporting at least one job of the Java application. Executing a distributed Java application on a plurality of compute nodes includes: tracking, by an application manager, a just-in-time (‘JIT’) compilation history for the JVMs installed on the plurality of compute nodes; and configuring, by the application manager, the plurality of jobs for execution on the plurality of compute nodes in dependence upon the JIT compilation history for the JVMs installed on the plurality of compute nodes.

    摘要翻译: 公开了用于在多个计算节点上执行分布式Java应用的方法,系统和产品。 Java应用程序包括分布在多个计算节点之间的多个作业。 多个计算节点通过数据通信网络连接在一起用于数据通信。 多个计算节点中的每一个已经在其上安装了能够支持Java应用程序的至少一个作业的Java虚拟机(JVM)。 在多个计算节点上执行分布式Java应用包括:应用程序管理器跟踪安装在多个计算节点上的JVM的即时(JIT)编译历史; 以及由应用程序管理器根据安装在多个计算节点上的JVM的JIT编译历史,在多个计算节点上配置用于执行的多个作业。

    GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM
    10.
    发明申请
    GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM 有权
    全球检测多节点计算机系统中的资源泄漏

    公开(公告)号:US20120246509A1

    公开(公告)日:2012-09-27

    申请号:US13492634

    申请日:2012-06-08

    IPC分类号: G06F11/07

    摘要: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

    摘要翻译: 公开了用于识别和恢复并行计算系统的计算节点上的资源泄漏的过程。 资源监视器以清洁状态存储有关计算节点上可用的系统资源的信息。 在计算节点运行作业之后,资源监视器将当前资源可用性与干净状态进行比较。 如果找到资源泄漏,资源监视器将联系全局资源管理器以删除资源泄漏。