Query performance data on parallel computer system having compute nodes
    31.
    发明授权
    Query performance data on parallel computer system having compute nodes 有权
    在具有计算节点的并行计算机系统上查询性能数据

    公开(公告)号:US08250164B2

    公开(公告)日:2012-08-21

    申请号:US12760783

    申请日:2010-04-15

    IPC分类号: G06F15/16

    摘要: Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node.

    摘要翻译: 本发明的实施例提供了一种在大规模并行计算系统上查询性能计数器数据的方法,同时最小化与中断计算机处理器和有限存储器资源相关联的成本。 可以将DMA描述符插入到大规模并行计算系统中的远程计算节点的注入FIFO中。 在执行由DMA描述符描述的DMA操作时,性能计数器数据可以从远程计算节点传送到目的地节点。

    Distributed Hardware Device Simulation
    32.
    发明申请
    Distributed Hardware Device Simulation 有权
    分布式硬件设备仿真

    公开(公告)号:US20120185230A1

    公开(公告)日:2012-07-19

    申请号:US13006696

    申请日:2011-01-14

    IPC分类号: G06F17/50

    摘要: Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component.

    摘要翻译: 分布式硬件设备仿真,包括:识别硬件设备的多个硬件组件; 提供模拟每个硬件组件的功能的软件组件,其中所述软件组件安装在分布式处理系统的计算节点上; 在所述软件组件中的至少一个中接收表示对所述硬件组件的输入的一个或多个消息; 用软件组件模拟硬件组件的操作,从而生成表示硬件组件的输出的软件组件的输出; 以及从所述软件组件向至少一个其他软件组件发送表示所述硬件组件的输出的一个或多个消息。

    Pacing network traffic among a plurality of compute nodes connected using a data communications network
    33.
    发明授权
    Pacing network traffic among a plurality of compute nodes connected using a data communications network 有权
    在使用数据通信网络连接的多个计算节点之间调度网络流量

    公开(公告)号:US08140704B2

    公开(公告)日:2012-03-20

    申请号:US12166748

    申请日:2008-07-02

    IPC分类号: G06F15/16 H04L1/00

    CPC分类号: H04L47/10 H04L47/283

    摘要: Methods, apparatus, and products are disclosed for pacing network traffic among a plurality of compute nodes connected using a data communications network. The network has a plurality of network regions, and the plurality of compute nodes are distributed among these network regions. Pacing network traffic among a plurality of compute nodes connected using a data communications network includes: identifying, by a compute node for each region of the network, a roundtrip time delay for communicating with at least one of the compute nodes in that region; determining, by the compute node for each region, a pacing algorithm for that region in dependence upon the roundtrip time delay for that region; and transmitting, by the compute node, network packets to at least one of the compute nodes in at least one of the network regions in dependence upon the pacing algorithm for that region.

    摘要翻译: 公开了用于在使用数据通信网络连接的多个计算节点之间起搏网络业务的方法,装置和产品。 网络具有多个网络区域,并且多个计算节点分布在这些网络区域中。 在使用数据通信网络连接的多个计算节点之间起搏网络流量包括:由计算节点针对网络的每个区域识别用于与该区域中的至少一个计算节点进行通信的往返时间延迟; 根据所述区域的往返时间延迟,由所述计算节点为每个区域确定所述区域的起搏算法; 以及根据该区域的起搏算法,由计算节点将网络分组发送到至少一个网络区域中的至少一个计算节点。

    Opportunistic queueing injection strategy for network load balancing
    36.
    发明授权
    Opportunistic queueing injection strategy for network load balancing 有权
    用于网络负载平衡的机会排队注入策略

    公开(公告)号:US07944842B2

    公开(公告)日:2011-05-17

    申请号:US11738034

    申请日:2007-04-20

    IPC分类号: H04L12/26

    摘要: Embodiments of the invention include a method, system, and article of manufacture that provide opportunistic queuing injection strategy used for data communication between nodes of a parallel computer system. A message may be encapsulated into a set of data packets. When the packets are sent, an opportunistic injection queue may be configured to transmit them to multiple hardware injection ports. This approach allows for complete network link saturation. In a parallel system with network links in multiple dimensions, sending message packets using more than one dimension may substantially increase network throughput.

    摘要翻译: 本发明的实施例包括提供用于并行计算机系统的节点之间的数据通信的机会排队注入策略的方法,系统和制品。 消息可以被封装到一组数据分组中。 当发送数据包时,可以配置机会性注入队列将其发送到多个硬件注入端口。 这种方法允许完整的网络链路饱和。 在具有多个维度的网络链路的并行系统中,使用多个维度发送消息分组可以显着增加网络吞吐量。

    Locating hardware faults in a data communications network of a parallel computer
    37.
    发明授权
    Locating hardware faults in a data communications network of a parallel computer 失效
    在并行计算机的数据通信网络中查找硬件故障

    公开(公告)号:US07646721B2

    公开(公告)日:2010-01-12

    申请号:US11279586

    申请日:2006-04-13

    IPC分类号: H04L12/26

    CPC分类号: H04L12/66

    摘要: Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.

    摘要翻译: 并行计算机的数据通信网络中的硬件故障位置。 这样的并行计算机包括多个计算节点和数据通信网络,该数据通信网络将计算节点耦合用于数据通信,并将计算节点组织为树。 定位硬件故障包括将下一个计算节点标识为父节点和父测试树的根,为父节点的每个子计算节点标识具有子计算节点的子测试树作为根,运行相同的测试套件 父测试树和每个子测试树,并且如果测试套件在父测试树上失败并且在所有子测试树上成功,则将父计算节点识别为具有从父计算节点连接到子计算节点的有缺陷链路 。

    Profiling An Application For Power Consumption During Execution On A Compute Node
    38.
    发明申请
    Profiling An Application For Power Consumption During Execution On A Compute Node 有权
    在计算节点上分析执行期间的功耗应用程序

    公开(公告)号:US20100005326A1

    公开(公告)日:2010-01-07

    申请号:US12167302

    申请日:2008-07-03

    IPC分类号: G06F1/32

    摘要: Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

    摘要翻译: 公开了方法,装置和产品,用于在计算节点执行期间对用于功耗的应用进行分析,所述计算节点包括:在计算节点上接收用于执行的应用; 识别所述计算节点的硬件功耗简档,所述硬件功耗简档在执行各种处理操作期间指定计算节点硬件的功耗; 根据应用和计算节点的硬件功耗特性,确定应用的功耗曲线; 并报告应用程序的功耗曲线。

    Effecting a Broadcast with an Allreduce Operation on a Parallel Computer
    40.
    发明申请
    Effecting a Broadcast with an Allreduce Operation on a Parallel Computer 失效
    在并行计算机上实现全反射广播

    公开(公告)号:US20090037511A1

    公开(公告)日:2009-02-05

    申请号:US11832918

    申请日:2007-08-02

    IPC分类号: G06F15/16

    CPC分类号: G06F9/542 G06F2209/543

    摘要: Methods, parallel computers, and computer program products are disclosed for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root. Embodiments include configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce.

    摘要翻译: 公开了方法,并行计算机和计算机程序产品,用于在并行计算机上实现具有全部还原操作的广播,该并行计算机包括多个计算节点,计算节点被组织成用于集体并行的至少一个运算组的计算节点 并行计算机的操作,操作组中的每个计算节点分配唯一的等级,操作组的计算节点通过全局组合网络耦合用于数据通信; 并且一个计算节点被分配为逻辑根。 实施例包括通过逻辑根节点将具有要广播的贡献的发送缓冲器配置到操作组中的每个排序节点; 由除逻辑根之外的所有排序节点配置用于从逻辑根接收贡献的接收缓冲器; 并且针对发送缓冲器中逻辑根的贡献的每个元素重复:由逻辑根贡献发送缓冲器中的贡献的元素; 由除逻辑根之外的所有排序的节点注入对应于该元素的大小的一个或多个零; 由操作组的所有计算节点执行使用该元素和被注入的零的具有按位OR的全部还原操作,产生全部还原操作的结果; 并且在除了逻辑根以外的所有排序节点的每个接收缓冲器中存储allreduce的结果。