Optimizing collective operations
    1.
    发明授权
    Optimizing collective operations 有权
    优化集体经营

    公开(公告)号:US09424087B2

    公开(公告)日:2016-08-23

    申请号:US12770286

    申请日:2010-04-29

    IPC分类号: G06F15/173 G06F9/50 H04L29/08

    CPC分类号: G06F9/5011 H04L67/10

    摘要: Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more nodes to perform the collective operation is not available; if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; selecting a next optimized collective operation; and performing the next optimized collective operation.

    摘要翻译: 优化集体经营,包括收取执行集体经营类型的指示; 为集体经营类型选择优化集体经营; 执行选定的优化集体操作; 确定一个或多个节点需要的资源来执行集体操作是否不可用; 如果一个或多个节点执行集体操作所需的资源不可用:通知其他节点资源不可用; 选择下一个优化的集体操作; 并执行下一个优化的集体操作。

    Performing collective operations in a distributed processing system
    2.
    发明授权
    Performing collective operations in a distributed processing system 有权
    在分布式处理系统中执行集体操作

    公开(公告)号:US08949328B2

    公开(公告)日:2015-02-03

    申请号:US13181601

    申请日:2011-07-13

    摘要: Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system that includes a plurality of compute nodes and a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology.

    摘要翻译: 用于在包括多个计算节点和多个任务的混合分布式处理系统上进行集体操作的方法,装置和计算机程序产品,每个任务被分配唯一的等级,并且每个计算节点被耦合用于数据通信 最少两种不同的网络拓扑。 两个网络拓扑中的至少一个是具有根任务和至少两个子任务的分层树状拓扑,并且所述至少两个子任务是同一层中彼此的对等体。 实施例包括对于每个任务,通过树状拓扑将与任务对应的至少一部分数据发送到任务的所有子任务; 以及通过所述第二拓扑将与所述任务相对应的所述数据的至少一部分发送到所述树形拓扑中的同一层的任务的所有对等体。

    Processing unexpected messages at a compute node of a parallel computer
    3.
    发明授权
    Processing unexpected messages at a compute node of a parallel computer 有权
    在并行计算机的计算节点处理意外的消息

    公开(公告)号:US08930962B2

    公开(公告)日:2015-01-06

    申请号:US13401975

    申请日:2012-02-22

    IPC分类号: G06F9/54 G06F9/44

    CPC分类号: G06F15/17306 G06F9/546

    摘要: Methods, apparatuses, and computer program products for processing unexpected messages at a compute node of a parallel computer are provided. Embodiments include receiving, by the compute node, a portion of a message from another compute node of the parallel computer, the message comprising a plurality of separate portions; in response to receiving the portion of the message, determining, by the compute node, whether one of the applications executing on the compute node, has indicated that the message is expected; if one of the applications executing on the compute node has not indicated that the message is expected, storing, by the compute node, the portion of the message in an unexpected message buffer within the compute node; and if one of the applications executing on the compute node has indicated that the message is expected, storing the portion of the message at a storage destination indicated by the message.

    摘要翻译: 提供了用于在并行计算机的计算节点处理意外消息的方法,装置和计算机程序产品。 实施例包括由计算节点从并行计算机的另一计算节点接收消息的一部分,该消息包括多个分离的部分; 响应于接收到所述消息的部分,由所述计算节点确定在所述计算节点上执行的所述应用中的一个是否已经指示所述消息是预期的; 如果在计算节点上执行的应用程序中的一个尚未指示消息是预期的,则由计算节点将该消息的部分存储在计算节点内的意外消息缓冲器中; 并且如果在计算节点上执行的应用程序中的一个已经指示该消息是预期的,则将消息的该部分存储在该消息指示的存储目的地。

    Configurable alert delivery for reducing the amount of alerts transmitted in a distributed processing system
    4.
    发明授权
    Configurable alert delivery for reducing the amount of alerts transmitted in a distributed processing system 失效
    可配置的警报传递,用于减少在分布式处理系统中传输的警报数量

    公开(公告)号:US08756462B2

    公开(公告)日:2014-06-17

    申请号:US13114463

    申请日:2011-05-24

    IPC分类号: G06F11/00 G06F11/30

    摘要: Methods, systems, and computer program products for configurable alert delivery in a distributed processing system are provided. Embodiments include for each alert generated by an incident analyzer, applying active alert filters to the alert; wherein applying the active alert filters to the alert includes: creating a list of all active alert filters and a set of all active listeners; and for each active alert filter, running the active alert filter; if the active alert filter indicates that the alert should not go to one or more of the active listeners, removing the one or more active listeners from the set of all active listeners; if the active listeners set is empty, stopping processing of the alert; and if the active listeners set is not empty, selecting the next active alert filter from the active alert filter list.

    摘要翻译: 提供了在分布式处理系统中可配置警报传送的方法,系统和计算机程序产品。 实施例包括由事件分析器生成的每个警报,对警报应用主动警报过滤器; 其中将所述主动警报过滤器应用于所述警报包括:创建所有活动警报过滤器和一组所有主动监听器的列表; 并为每个活动警报过滤器运行主动警报过滤器; 如果主动警报过滤器指示警报不应该去一个或多个主动侦听器,则从所有活动侦听器的集合中移除一个或多个活动侦听器; 如果主动侦听器设置为空,则停止处理警报; 并且如果活动侦听器设置不为空,则从活动警报过滤器列表中选择下一个活动警报过滤器。

    Selected alert delivery in a distributed processing system
    5.
    发明授权
    Selected alert delivery in a distributed processing system 失效
    在分布式处理系统中选择警报传递

    公开(公告)号:US08713581B2

    公开(公告)日:2014-04-29

    申请号:US13282995

    申请日:2011-10-27

    IPC分类号: G06F13/00

    摘要: Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving, by an incident analyzer, one or more events from one or more resources, each event identifying a location of the resource producing the event; creating, by the incident analyzer, potential alerts in dependence upon a location of the resource producing the event and location scoping rules; selecting for consolidation, by the incident analyzer, one or more of the potential alerts based on consolidation rules; and creating, by the incident analyzer, a consolidated alert based on the consolidation rules and the selected one or more potential alerts.

    摘要翻译: 提供了用于在分布式处理系统中选择的警报传递的方法,装置和计算机程序产品。 实施例包括由事件分析器接收来自一个或多个资源的一个或多个事件,每个事件识别产生事件的资源的位置; 由事件分析器根据产生事件和位置范围规则的资源的位置创建潜在的警报; 通过事件分析器选择合并,基于合并规则的一个或多个潜在警报; 并且由事件分析器基于合并规则和所选择的一个或多个潜在警报来创建综合警报。

    Administering incident pools for event and alert analysis
    6.
    发明授权
    Administering incident pools for event and alert analysis 失效
    管理事件池进行事件和警报分析

    公开(公告)号:US08645757B2

    公开(公告)日:2014-02-04

    申请号:US13116382

    申请日:2011-05-26

    IPC分类号: G06F11/00

    摘要: Administering incident pools including receiving, by an incident analyzer from an incident queue, a plurality of incidents from one or more components of the distributed processing system; assigning, by the incident analyzer, each received incident to a pool of incidents; assigning, by the incident analyzer, to each incident a particular combined minimum time for inclusion in one or more pools, each particular combined minimum time corresponding to a particular incident; in response to the pool closing, determining, by the incident analyzer, for each incident in the pool whether the incident has met its combined minimum time for inclusion in one or more pools; and if the incident has been in the pool for its combined minimum time, including, by the incident analyzer, the incident in the closed pool; and if the incident has not been in the pool for its combined minimum time, including the incident in a next pool.

    摘要翻译: 管理事件池,包括由事件分析器从事件队列接收来自分布式处理系统的一个或多个组件的多个事件; 事件分析器将每个事件分配给一系列事件; 由事件分析器将每个事件分配给一个或多个池中的特定组合最小时间,每个特定组合的最小时间对应于特定事件; 响应池关闭,由事件分析器确定池中的每个事件是否事件已经满足其包含在一个或多个池中的合并的最小时间; 并且如果事件已经在池中合并的最小时间,包括事件分析器中的事件在封闭的池中; 如果事件没有在池中合并的最短时间,包括在下一个池中的事件。

    Dynamic administration of component event reporting in a distributed processing system
    7.
    发明授权
    Dynamic administration of component event reporting in a distributed processing system 有权
    在分布式处理系统中动态管理组件事件报告

    公开(公告)号:US08621277B2

    公开(公告)日:2013-12-31

    申请号:US12960990

    申请日:2010-12-06

    IPC分类号: G06F11/00

    摘要: Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules.

    摘要翻译: 方法,系统和产品被提供用于分布式处理系统中的组件事件报告的动态管理,包括由来自事件队列的事件分析器从分布式处理系统的一个或多个组件接收多个事件; 由所述事件分析器根据接收的事件和一个或多个事件分析规则确定改变一个或多个组件的事件报告规则; 并且由事件分析器指示一个或多个组件来改变事件报告规则。

    Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally
    8.
    发明授权
    Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally 有权
    通过在本地维护远程内存值来减少混合计算环境中的内存的远程读取

    公开(公告)号:US08539166B2

    公开(公告)日:2013-09-17

    申请号:US13416636

    申请日:2012-03-09

    IPC分类号: G06F12/02

    CPC分类号: G06F15/17331

    摘要: Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer.

    摘要翻译: 通过在本地维护远程存储器值来减少混合计算环境中的存储器的远程读取,所述混合计算环境包括主计算机和多个加速器,所述主计算机和加速器各自具有与另一个远程共享的本地存储器,包括写入 表示主机计算机上加速器存储器值的变化,本地存储器和本地存储器中的远程共享存储器中的增加的主机计算机数据包的共享存储器,表示写入主计算机的分组总数的计数器值, 主计算机从主计算机的共享存储器中写入数据包,将读取的数据移动到应用程序存储器,并在本地存储器和加速器的远程共享存储器中递增一个计数器值,表示读取的数据包的总数 主机。

    Initiating A Collective Operation In A Parallel Computer
    9.
    发明申请
    Initiating A Collective Operation In A Parallel Computer 失效
    在并行计算机中启动集体操作

    公开(公告)号:US20130212145A1

    公开(公告)日:2013-08-15

    申请号:US13369454

    申请日:2012-02-09

    IPC分类号: G06F15/16

    CPC分类号: G06F9/5066

    摘要: Initiating a collective operation in a parallel computer that includes compute nodes coupled for data communications and organized in an operational group for collective operations with one compute node assigned as a root node, including: identifying, by a non-root compute node, a collective operation to execute in the operational group of compute nodes; initiating, by the non-root compute node, execution of the collective operation amongst the compute nodes of the operational group including: sending, by the non-root compute node to one or more of the other compute nodes in the operational group, an active message, the active message including information configured to initiate execution of the collective operation amongst the compute nodes of the operational group; and executing, by the compute nodes of the operational group, the collective operation.

    摘要翻译: 在并行计算机中启动集合操作,其包括耦合用于数据通信的计算节点并且被组织在用于集中操作的操作组中,其中一个计算节点被分配为根节点,包括:非根计算节点识别集合操作 在计算节点的操作组中执行; 由非根计算节点发起在操作组的计算节点之间的集体操作的执行,包括:非根计算节点向操作组中的一个或多个其他计算节点发送活动 消息,所述活动消息包括被配置为在所述操作组的所述计算节点之间启动所述集体操作的执行的信息; 并且由操作组的计算节点执行集体操作。

    Compressing Result Data For A Compute Node In A Parallel Computer
    10.
    发明申请
    Compressing Result Data For A Compute Node In A Parallel Computer 审中-公开
    并行计算机中计算节点的压缩结果数据

    公开(公告)号:US20120331270A1

    公开(公告)日:2012-12-27

    申请号:US13166183

    申请日:2011-06-22

    IPC分类号: G06F15/76 G06F9/06

    CPC分类号: G06F9/54

    摘要: Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.

    摘要翻译: 压缩并行计算机中的计算节点的结果数据,所述并行计算机包括被组织为树的计算节点的集合,包括:通过所述计算节点集合的逻辑根发起集体收集操作,包括添加所述计算节点的结果数据 逻辑根到一个收集缓冲区; 对于计算节点集合中的每个计算节点,确定计算节点的结果数据是否已经被写入聚集缓冲器中; 并且如果计算节点的结果数据已经被写入收集缓冲器中,则增加分配给已经写入收集缓冲器的结果数据的计数器; 并且如果计算节点的结果数据尚未写入收集缓冲器中,则将计算节点的结果数据作为新结果数据写入收集缓冲器,增加分配给该新结果数据的计数器,并将其写入集合 缓冲节点ID。