Optimizing collective operations
    1.
    发明授权
    Optimizing collective operations 有权
    优化集体经营

    公开(公告)号:US09424087B2

    公开(公告)日:2016-08-23

    申请号:US12770286

    申请日:2010-04-29

    IPC分类号: G06F15/173 G06F9/50 H04L29/08

    CPC分类号: G06F9/5011 H04L67/10

    摘要: Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more nodes to perform the collective operation is not available; if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; selecting a next optimized collective operation; and performing the next optimized collective operation.

    摘要翻译: 优化集体经营,包括收取执行集体经营类型的指示; 为集体经营类型选择优化集体经营; 执行选定的优化集体操作; 确定一个或多个节点需要的资源来执行集体操作是否不可用; 如果一个或多个节点执行集体操作所需的资源不可用:通知其他节点资源不可用; 选择下一个优化的集体操作; 并执行下一个优化的集体操作。

    Performing collective operations in a distributed processing system
    2.
    发明授权
    Performing collective operations in a distributed processing system 有权
    在分布式处理系统中执行集体操作

    公开(公告)号:US08949328B2

    公开(公告)日:2015-02-03

    申请号:US13181601

    申请日:2011-07-13

    摘要: Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system that includes a plurality of compute nodes and a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology.

    摘要翻译: 用于在包括多个计算节点和多个任务的混合分布式处理系统上进行集体操作的方法,装置和计算机程序产品,每个任务被分配唯一的等级,并且每个计算节点被耦合用于数据通信 最少两种不同的网络拓扑。 两个网络拓扑中的至少一个是具有根任务和至少两个子任务的分层树状拓扑,并且所述至少两个子任务是同一层中彼此的对等体。 实施例包括对于每个任务,通过树状拓扑将与任务对应的至少一部分数据发送到任务的所有子任务; 以及通过所述第二拓扑将与所述任务相对应的所述数据的至少一部分发送到所述树形拓扑中的同一层的任务的所有对等体。

    Dynamic administration of component event reporting in a distributed processing system
    3.
    发明授权
    Dynamic administration of component event reporting in a distributed processing system 有权
    在分布式处理系统中动态管理组件事件报告

    公开(公告)号:US08621277B2

    公开(公告)日:2013-12-31

    申请号:US12960990

    申请日:2010-12-06

    IPC分类号: G06F11/00

    摘要: Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules.

    摘要翻译: 方法,系统和产品被提供用于分布式处理系统中的组件事件报告的动态管理,包括由来自事件队列的事件分析器从分布式处理系统的一个或多个组件接收多个事件; 由所述事件分析器根据接收的事件和一个或多个事件分析规则确定改变一个或多个组件的事件报告规则; 并且由事件分析器指示一个或多个组件来改变事件报告规则。

    Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally
    4.
    发明授权
    Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally 有权
    通过在本地维护远程内存值来减少混合计算环境中的内存的远程读取

    公开(公告)号:US08539166B2

    公开(公告)日:2013-09-17

    申请号:US13416636

    申请日:2012-03-09

    IPC分类号: G06F12/02

    CPC分类号: G06F15/17331

    摘要: Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer.

    摘要翻译: 通过在本地维护远程存储器值来减少混合计算环境中的存储器的远程读取,所述混合计算环境包括主计算机和多个加速器,所述主计算机和加速器各自具有与另一个远程共享的本地存储器,包括写入 表示主机计算机上加速器存储器值的变化,本地存储器和本地存储器中的远程共享存储器中的增加的主机计算机数据包的共享存储器,表示写入主计算机的分组总数的计数器值, 主计算机从主计算机的共享存储器中写入数据包,将读取的数据移动到应用程序存储器,并在本地存储器和加速器的远程共享存储器中递增一个计数器值,表示读取的数据包的总数 主机。

    Compressing Result Data For A Compute Node In A Parallel Computer
    5.
    发明申请
    Compressing Result Data For A Compute Node In A Parallel Computer 审中-公开
    并行计算机中计算节点的压缩结果数据

    公开(公告)号:US20120331270A1

    公开(公告)日:2012-12-27

    申请号:US13166183

    申请日:2011-06-22

    IPC分类号: G06F15/76 G06F9/06

    CPC分类号: G06F9/54

    摘要: Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.

    摘要翻译: 压缩并行计算机中的计算节点的结果数据,所述并行计算机包括被组织为树的计算节点的集合,包括:通过所述计算节点集合的逻辑根发起集体收集操作,包括添加所述计算节点的结果数据 逻辑根到一个收集缓冲区; 对于计算节点集合中的每个计算节点,确定计算节点的结果数据是否已经被写入聚集缓冲器中; 并且如果计算节点的结果数据已经被写入收集缓冲器中,则增加分配给已经写入收集缓冲器的结果数据的计数器; 并且如果计算节点的结果数据尚未写入收集缓冲器中,则将计算节点的结果数据作为新结果数据写入收集缓冲器,增加分配给该新结果数据的计数器,并将其写入集合 缓冲节点ID。

    Dynamic Administration Of Event Pools For Relevant Event And Alert Analysis During Event Storms
    6.
    发明申请
    Dynamic Administration Of Event Pools For Relevant Event And Alert Analysis During Event Storms 有权
    在事件风暴期间相关事件和警报分析的事件池的动态管理

    公开(公告)号:US20120144020A1

    公开(公告)日:2012-06-07

    申请号:US12961687

    申请日:2010-12-07

    IPC分类号: G06F15/173

    CPC分类号: H04L43/0823 G06Q10/06

    摘要: Dynamic administration of event pools for relevant event and alert analysis during event storms including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system, each event including an occurred time and a logged time; creating, by the event analyzer, an events pool; determining whether an arrival rate of the events from the components of the distributed processing system is greater than a predetermined threshold; if the arrival rate is greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their occurred time; and if the arrival rate is not greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their logged time.

    摘要翻译: 在事件风暴期间动态管理相关事件和警报分析的事件池,包括从事件队列的事件分析器接收来自分布式处理系统的一个或多个组件的多个事件,每个事件包括发生的时间和记录 时间; 由事件分析器创建事件池; 确定来自分布式处理系统的组件的事件的到达速率是否大于预定阈值; 如果到达率大于预定阈值,则由事件分析器根据其发生的时间将多个事件分配给事件池; 并且如果到达速率不大于预定阈值,则由事件分析器根据其记录的时间将多个事件分配给事件池。

    Terminating An Accelerator Application Program In A Hybrid Computing Environment
    7.
    发明申请
    Terminating An Accelerator Application Program In A Hybrid Computing Environment 有权
    在混合计算环境中终止加速器应用程序

    公开(公告)号:US20110191785A1

    公开(公告)日:2011-08-04

    申请号:US12699162

    申请日:2010-02-03

    IPC分类号: G06F9/46

    CPC分类号: G06F9/46

    摘要: Terminating an accelerator application program in a hybrid computing environment that includes a host computer having a host computer architecture and an accelerator having an accelerator architecture, where the host computer and the accelerator are adapted to one another for data communications by a system level message passing module (‘SLMPM’), and terminating an accelerator application program in a hybrid computing environment includes receiving, by the SLMPM from a host application executing on the host computer, a request to terminate an accelerator application program executing on the accelerator; terminating, by the SLMPM, execution of the accelerator application program; returning, by the SLMPM to the host application, a signal indicating that execution of the accelerator application program was terminated; and performing, by the SLMPM, a cleanup of the execution environment associated with the terminated accelerator application program.

    摘要翻译: 在包括具有主机结构的主计算机和具有加速器架构的加速器的混合计算环境中终止加速器应用程序,其中所述主计算机和所述加速器彼此适配以用于由系统级消息传递模块进行数据通信 (“SLMPM”),并且在混合计算环境中终止加速器应用程序包括:通过SLMPM从在主计算机上执行的主机应用程序接收终止在加速器上执行的加速器应用程序的请求; 通过SLMPM终止加速器应用程序的执行; 通过SLMPM向主机应用返回指示加速器应用程序的执行被终止的信号; 并且通过SLMPM执行与终止的加速器应用程序相关联的执行环境的清理。

    Discovering a resource in a distributed computing system
    8.
    发明授权
    Discovering a resource in a distributed computing system 有权
    在分布式计算系统中发现资源

    公开(公告)号:US09448850B2

    公开(公告)日:2016-09-20

    申请号:US12722107

    申请日:2010-03-11

    IPC分类号: G06F15/16 G06F9/50

    CPC分类号: G06F9/5061

    摘要: Sending, by a node requesting information regarding a resource to one or more nodes in a distributed computing system, an active message to perform a collective operation; contributing, by each node not having a resource, a value of zero to the collective operation; contributing, by a node having the resource, the node's rank; storing the result of the collective operation in a buffer of the requesting node; and identifying, in dependence upon the result of the collective operation, the rank of the node having the resource.

    摘要翻译: 通过向分布式计算系统中的一个或多个节点请求关于资源的信息的节点发送用于执行集体操作的活动消息; 每个不具有资源的节点对集体操作的值为零; 由具有资源的节点贡献节点的等级; 将所述集合操作的结果存储在请求节点的缓冲器中; 并且根据集体操作的结果识别具有资源的节点的等级。

    Flexible event data content management for relevant event and alert analysis within a distributed processing system
    9.
    发明授权
    Flexible event data content management for relevant event and alert analysis within a distributed processing system 有权
    灵活的事件数据内容管理,用于分布式处理系统内的相关事件和警报分析

    公开(公告)号:US09419650B2

    公开(公告)日:2016-08-16

    申请号:US13166470

    申请日:2011-06-22

    摘要: Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include capturing, by an interface connector, an event from a resource of the distributed processing system; inserting, by the interface connector, the event into an event database; receiving from the interface connector, by a notifier, a notification of insertion of the event into the event database; based on the received notification, tracking, by the notifier, the number of events indicated as inserted into the event database; receiving from the notifier, by a monitor, a cumulative notification indicating the number of events that have been inserted into the event database; in response to receiving the cumulative notification, retrieving, by the monitor, from the event database, events inserted into the event database; and processing, by the monitor, the retrieved events.

    摘要翻译: 提供了在分布式处理系统中进行相关事件和警报分析的灵活事件数据内容管理的方法,系统和计算机程序产品。 实施例包括通过接口连接器捕获来自分布式处理系统的资源的事件; 通过接口连接器将事件插入到事件数据库中; 通过通知器从接口连接器接收将事件插入到事件数据库中的通知; 基于收到的通知,通知器跟踪指示为插入到事件数据库中的事件数; 从通知器通过监视器接收指示已插入到事件数据库中的事件数量的累积通知; 响应于接收到累积通知,由监视器从事件数据库检索插入到事件数据库中的事件; 并通过监视器处理检索到的事件。

    Assigning a unique identifier to a communicator
    10.
    发明授权
    Assigning a unique identifier to a communicator 有权
    为通信者分配唯一的标识符

    公开(公告)号:US09348661B2

    公开(公告)日:2016-05-24

    申请号:US12721981

    申请日:2010-03-11

    IPC分类号: G06F15/16 G06F9/54

    CPC分类号: G06F9/54

    摘要: Creating, by a parent master process of a parent communicator, a child communicator, including configuring the child communicator with a child master process, wherein a communicator includes a collection of one or more processes executing on compute nodes of a distributed computing system; determining, by the parent master process, whether a unique identifier is available to assign to the child communicator; if a unique identifier is available to assign to the child communicator, assigning, by the parent master process, the available unique identifier to the child communicator; and if a unique identifier is not available to assign to the child communicator: retrieving, by the parent master process, an available unique identifier from a master process of another communicator in a tree of communicators and assigning the retrieved unique identifier to the child communicator.

    摘要翻译: 通过父通信器的父母主进程创建子通信器,包括用子主进程配置子通信器,其中通信器包括在分布式计算系统的计算节点上执行的一个或多个进程的集合; 由父主进程确定唯一标识符是否可用于分配给子通信器; 如果唯一标识符可用于分配给子通信器,则由父主进程将可用的唯一标识符分配给子通信器; 并且如果唯一标识符不可用于分配给子通信器:由父主进程从通信器树中的另一通信器的主进程检索可用的唯一标识符,并将所检索的唯一标识符分配给子通信器。