Scheduling jobs of a multi-node computer system based on environmental impact
    1.
    发明授权
    Scheduling jobs of a multi-node computer system based on environmental impact 有权
    基于环境影响调度多节点计算机系统的作业

    公开(公告)号:US09015726B2

    公开(公告)日:2015-04-21

    申请号:US12418044

    申请日:2009-04-03

    IPC分类号: G06F9/50 G06F9/48

    摘要: Embodiments of the invention provide techniques for scheduling jobs on a multi-node computing system based on the predicted environmental impact of executing the jobs. In one embodiment, a plurality of job plans may be generated for processing a requested job on the multi-node computing system. The environmental impacts resulting from executing each job plan may be estimated by matching the job plans to stored data based on standardized executions of job plans. Further, environmental impacts may be estimated by matching the job plans to stored data based on actual environmental measurements obtained during prior executions of the job plan on the multi-node computer system. The job may be executed using a job plan selected based on predicted environmental impacts and time performance.

    摘要翻译: 本发明的实施例提供了基于执行作业的预测的环境影响来在多节点计算系统上调度作业的技术。 在一个实施例中,可以生成多个作业计划用于在多节点计算系统上处理所请求的作业。 执行每个工作计划所产生的环境影响可以通过将作业计划与基于工作计划的标准化执行的存储数据进行匹配来估计。 此外,可以基于在多节点计算机系统上的作业计划的先前执行期间获得的实际环境测量来将作业计划与存储的数据相匹配来估计环境影响。 可以使用基于预测的环境影响和时间表现选择的工作计划来执行该作业。

    Reducing storage costs associated with backing up a database
    2.
    发明授权
    Reducing storage costs associated with backing up a database 有权
    降低与备份数据库相关联的存储成本

    公开(公告)号:US08818955B2

    公开(公告)日:2014-08-26

    申请号:US12947893

    申请日:2010-11-17

    IPC分类号: G06F7/00 G06F17/30 G06F11/14

    摘要: Techniques are disclosed for backing up and/or restoring data. In one embodiment, a request is received to back up at least a first unit of data stored in a database. Upon determining that the first unit of data is at least partially derived from a second unit of data stored in the database, a backup may be generated that includes less than all of the first unit of data. Subsequently, the first unit of data may be fully restored from the backup.

    摘要翻译: 公开了用于备份和/或恢复数据的技术。 在一个实施例中,接收到请求以备份存储在数据库中的至少第一数据单元。 在确定第一数据单元至少部分地从存储在数据库中的第二数据单元导出时,可以生成包括少于所有第一数据单元的备份。 随后,可以从备份中完全恢复第一个数据单元。

    Management of persistent memory in a multi-node computer system
    3.
    发明授权
    Management of persistent memory in a multi-node computer system 有权
    多节点计算机系统中持久性内存的管理

    公开(公告)号:US08812818B2

    公开(公告)日:2014-08-19

    申请号:US13372609

    申请日:2012-02-14

    IPC分类号: G06F13/00 G06F9/26

    CPC分类号: G06F12/1027 G06F12/1072

    摘要: A method and apparatus creates and manages persistent memory (PM) in a multi-node computing system. A PM Manager in the service node creates and manages pools of nodes with various sizes of PM. A node manager uses the pools of nodes to load applications to the nodes according to the size of the available PM. The PM Manager can dynamically adjust the size of the PM according to the needs of the applications based on historical use or as determined by a system administrator. The PM Manager works with an operating system kernel on the nodes to provide persistent memory for application data and system metadata. The PM Manager uses the persistent memory to load applications to preserve data from one application to the next. Also, the data preserved in persistent memory may be system metadata such as file system data that will be available to subsequent applications.

    摘要翻译: 方法和装置在多节点计算系统中创建和管理持久存储器(PM)。 服务节点中的PM管理器创建和管理具有各种大小的PM的节点池。 节点管理器使用节点池根据可用PM的大小将应用程序加载到节点。 PM管理员可以根据历史使用或系统管理员确定的应用程序的需要动态调整PM的大小。 PM管理器与节点上的操作系统内核一起工作,为应用程序数据和系统元数据提供持久性内存。 PM管理器使用持久性内存来加载应用程序以将数据从一个应用程序保留到下一个应用程序。 此外,保留在持久存储器中的数据可以是诸如文件系统数据的系统元数据,这些数据将可用于后续应用。

    Executing an application on a parallel computer
    4.
    发明授权
    Executing an application on a parallel computer 有权
    在并行计算机上执行应用程序

    公开(公告)号:US08516494B2

    公开(公告)日:2013-08-20

    申请号:US12140023

    申请日:2008-06-16

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5038

    摘要: Methods, apparatus, and products are disclosed for executing an application on a parallel computer that include: executing, by a current compute node, a current task of the application, including producing results; determining, by the current compute node in dependence upon current network characteristics and application characteristics, whether to transfer the results to a next compute node for further processing by a next task on the next compute node or to execute the next task for further processing of the results on the current compute node; transferring, by the current compute node, the results to the next compute node for further processing by the next task on the next compute node if the determination specifies transferring the results to the next node; and executing, by the current compute node, the next task for further processing of the results if the determination specifies executing the next task on the current compute node.

    摘要翻译: 公开了用于在并行计算机上执行应用的方法,装置和产品,包括:由当前计算节点执行应用的当前任务,包括产生结果; 根据当前网络特性和应用特征,由当前计算节点确定是否将结果传送到下一个计算节点,以便下一个计算节点上的下一个任务进行进一步处理,或执行下一个任务以进一步处理 当前计算节点的结果; 如果确定指定将结果传送到下一个节点,则由当前计算节点将结果传送到下一个计算节点,以便下一个计算节点上的下一个任务进一步处理; 以及如果所述确定指定在当前计算节点上执行下一个任务,则由当前计算节点执行下一个任务以进一步处理结果。

    Job scheduling and distribution on a partitioned compute tree based on job priority and network utilization
    5.
    发明授权
    Job scheduling and distribution on a partitioned compute tree based on job priority and network utilization 失效
    基于作业优先级和网络利用率,在分区计算树上进行作业调度和分配

    公开(公告)号:US08381220B2

    公开(公告)日:2013-02-19

    申请号:US11930611

    申请日:2007-10-31

    CPC分类号: H04L67/325

    摘要: A method and apparatus optimizes job and data distribution on a multi-node computing system. A job scheduler distributes jobs and data to compute nodes according to priority and other resource attributes to ensure the most critical work is done on the nodes that are quickest to access and with less possibility of node communication failure. In a tree network configuration, the job scheduler distributes critical jobs and data to compute nodes that are located closest to the I/O nodes. Other resource attributes include network utilization, constant data state, and class routing.

    摘要翻译: 一种方法和装置优化多节点计算系统上的作业和数据分布。 作业调度程序根据优先级和其他资源属性将作业和数据分配给计算节点,以确保最快速访问的节点执行最关键的工作,并减少节点通信失败的可能性。 在树形网络配置中,作业调度程序将关键作业和数据分配给位于最靠近I / O节点的计算节点。 其他资源属性包括网络利用率,常数数据状态和类路由。

    Mechanism for process migration on a massively parallel computer
    6.
    发明授权
    Mechanism for process migration on a massively parallel computer 失效
    大规模并行计算机上进程迁移的机制

    公开(公告)号:US08370844B2

    公开(公告)日:2013-02-05

    申请号:US11853927

    申请日:2007-09-12

    摘要: Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.

    摘要翻译: 本发明的实施例提供了一种用于大规模并行计算机系统上的过程迁移的机制。 特别地,可以使用本发明的实施例来跨越在执行并行任务的给定并行系统中存在的计算节点的整个集合来更新用于迁移的计算节点(例如MPI(或其他通信库))状态数据的进程状态数据 。 将一个计算节点迁移到另一个计算节点可能有助于解决各种次优的运行条件。 例如,可以迁移一个或多个进程以修复由映射不良的任务引起的网络拥塞,或者当预测计算节点经历硬件故障时。

    Executing a distributed software application on a plurality of compute nodes according to a compilation history
    7.
    发明授权
    Executing a distributed software application on a plurality of compute nodes according to a compilation history 失效
    根据编译历史在多个计算节点上执行分布式软件应用程序

    公开(公告)号:US08281311B2

    公开(公告)日:2012-10-02

    申请号:US12109248

    申请日:2008-04-24

    IPC分类号: G06F9/46

    摘要: Methods, systems, and products are disclosed for executing a distributed Java application on a plurality of compute nodes. The Java application includes a plurality of jobs distributed among the plurality of compute nodes. The plurality of compute nodes are connected together for data communications through a data communication network. Each of the plurality of compute nodes has installed upon it a Java Virtual Machine (‘JVM’) capable of supporting at least one job of the Java application. Executing a distributed Java application on a plurality of compute nodes includes: tracking, by an application manager, a just-in-time (‘JIT’) compilation history for the JVMs installed on the plurality of compute nodes; and configuring, by the application manager, the plurality of jobs for execution on the plurality of compute nodes in dependence upon the JIT compilation history for the JVMs installed on the plurality of compute nodes.

    摘要翻译: 公开了用于在多个计算节点上执行分布式Java应用的方法,系统和产品。 Java应用程序包括分布在多个计算节点之间的多个作业。 多个计算节点通过数据通信网络连接在一起用于数据通信。 多个计算节点中的每一个已经在其上安装了能够支持Java应用程序的至少一个作业的Java虚拟机(JVM)。 在多个计算节点上执行分布式Java应用包括:应用程序管理器跟踪安装在多个计算节点上的JVM的即时(JIT)编译历史; 以及由应用程序管理器根据安装在多个计算节点上的JVM的JIT编译历史,在多个计算节点上配置用于执行的多个作业。

    GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM
    8.
    发明申请
    GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM 有权
    全球检测多节点计算机系统中的资源泄漏

    公开(公告)号:US20120246509A1

    公开(公告)日:2012-09-27

    申请号:US13492634

    申请日:2012-06-08

    IPC分类号: G06F11/07

    摘要: A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.

    摘要翻译: 公开了用于识别和恢复并行计算系统的计算节点上的资源泄漏的过程。 资源监视器以清洁状态存储有关计算节点上可用的系统资源的信息。 在计算节点运行作业之后,资源监视器将当前资源可用性与干净状态进行比较。 如果找到资源泄漏,资源监视器将联系全局资源管理器以删除资源泄漏。

    UTILIZING VIRTUAL PRIVATE NETWORKS TO PROVIDE OBJECT LEVEL SECURITY ON A MULTI-NODE COMPUTER SYSTEM
    9.
    发明申请
    UTILIZING VIRTUAL PRIVATE NETWORKS TO PROVIDE OBJECT LEVEL SECURITY ON A MULTI-NODE COMPUTER SYSTEM 有权
    利用虚拟私有网络在多节点计算机系统上提供对象级别的安全

    公开(公告)号:US20120151573A1

    公开(公告)日:2012-06-14

    申请号:US13372653

    申请日:2012-02-14

    IPC分类号: G06F21/20 G06F17/30

    摘要: The disclosure herein provides data security on a parallel computer system using virtual private networks connecting the nodes of the system. A mechanism sets up access control data in the nodes that describes a number of security classes. Each security class is associated with a virtual network. Each user on the system is associated with one of the security classes. Each database object to be protected is given an attribute of a security class. Database objects are loaded into the system nodes that match the security class of the database object. When a query executes on the system, the query is sent to a particular class or set of classes such that the query is only seen by those nodes that are authorized by the equivalent security class. In this way, the network is used to isolate data from users that do not have proper authorization to access the data.

    摘要翻译: 本文的公开内容使用连接系统的节点的虚拟专用网络在并行计算机系统上提供数据安全性。 一种机制在描述多个安全类的节点中建立访问控制数据。 每个安全类与虚拟网络相关联。 系统上的每个用户与其中一个安全类相关联。 每个要保护的数据库对象都被赋予一个安全类的属性。 数据库对象加载到与数据库对象的安全类匹配的系统节点中。 当在系统上执行查询时,将查询发送到特定的类或一组类,以使查询只能被等效的安全类授权的那些节点看到。 以这种方式,网络用于隔离不具有访问数据的正确授权的用户的数据。

    NORMALIZING DATA ON DATABASE RESTORE
    10.
    发明申请
    NORMALIZING DATA ON DATABASE RESTORE 有权
    在数据库恢复上正确化数据

    公开(公告)号:US20120150806A1

    公开(公告)日:2012-06-14

    申请号:US12963675

    申请日:2010-12-09

    IPC分类号: G06F17/00

    摘要: Techniques for normalizing a database as part of a database restore. Embodiments may receive a database restore request indicating a previous state to restore a database to. Responsive to the request, embodiments may restore the database to the previous state using backup data associated with the previous state, and normalize the restored database using historical database usage data based on one or more previous database operations.

    摘要翻译: 用于将数据库归一化为数据库还原的一部分的技术。 实施例可以接收指示先前状态以恢复数据库的数据库恢复请求。 响应于请求,实施例可以使用与先前状态相关联的备份数据将数据库恢复到先前状态,并且使用基于一个或多个先前数据库操作的历史数据库使用数据来归一化恢复的数据库。