Performing a deterministic reduction operation in a compute node organized into a branched tree topology
    31.
    发明授权
    Performing a deterministic reduction operation in a compute node organized into a branched tree topology 失效
    在组织成分支树拓扑的计算节点中执行确定性简化操作

    公开(公告)号:US08489859B2

    公开(公告)日:2013-07-16

    申请号:US12790037

    申请日:2010-05-28

    IPC分类号: G06F9/00

    CPC分类号: G06F15/76 G06F15/17318

    摘要: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

    摘要翻译: 在包括计算节点的并行计算机中执行确定性简化操作,每个节点包括计算机处理器和将计算机处理器彼此耦合以用于数据通信的CAU(集体加速单元),包括将处理器和CAU组织成分支树形拓扑 其中CAU是根,处理器是孩子; 从每个处理器以任何顺序接收虚拟贡献数据,其中每个处理器在从根CAU接收到接收确认之前被限制不发送任何其他数据到根CAU; 由根CAU以分支树拓扑结构向处理器发送预定义的顺序,接收虚拟贡献数据的确认; 根据CAU从预定义的顺序从处理器接收处理器对减少操作的贡献数据; 并由根CAU减少处理器的贡献数据。

    Effecting hardware acceleration of broadcast operations in a parallel computer

    公开(公告)号:US08346883B2

    公开(公告)日:2013-01-01

    申请号:US12782791

    申请日:2010-05-19

    IPC分类号: G06F15/167

    CPC分类号: G06F15/17318

    摘要: Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.

    Profiling an application for power consumption during execution on a plurality of compute nodes
    34.
    发明授权
    Profiling an application for power consumption during execution on a plurality of compute nodes 有权
    在执行期间在多个计算节点上分析应用程序的功耗

    公开(公告)号:US08250389B2

    公开(公告)日:2012-08-21

    申请号:US12167302

    申请日:2008-07-03

    IPC分类号: G06F1/32

    摘要: Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

    摘要翻译: 公开了方法,装置和产品,用于在计算节点执行期间对用于功耗的应用进行分析,所述计算节点包括:在计算节点上接收用于执行的应用; 识别所述计算节点的硬件功耗简档,所述硬件功耗简档在执行各种处理操作期间指定计算节点硬件的功耗; 根据应用和计算节点的硬件功耗特性,确定应用的功耗曲线; 并报告应用程序的功耗曲线。

    Query performance data on parallel computer system having compute nodes
    35.
    发明授权
    Query performance data on parallel computer system having compute nodes 有权
    在具有计算节点的并行计算机系统上查询性能数据

    公开(公告)号:US08250164B2

    公开(公告)日:2012-08-21

    申请号:US12760783

    申请日:2010-04-15

    IPC分类号: G06F15/16

    摘要: Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node.

    摘要翻译: 本发明的实施例提供了一种在大规模并行计算系统上查询性能计数器数据的方法,同时最小化与中断计算机处理器和有限存储器资源相关联的成本。 可以将DMA描述符插入到大规模并行计算系统中的远程计算节点的注入FIFO中。 在执行由DMA描述符描述的DMA操作时,性能计数器数据可以从远程计算节点传送到目的地节点。

    Distributed Hardware Device Simulation
    36.
    发明申请
    Distributed Hardware Device Simulation 有权
    分布式硬件设备仿真

    公开(公告)号:US20120185230A1

    公开(公告)日:2012-07-19

    申请号:US13006696

    申请日:2011-01-14

    IPC分类号: G06F17/50

    摘要: Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component.

    摘要翻译: 分布式硬件设备仿真,包括:识别硬件设备的多个硬件组件; 提供模拟每个硬件组件的功能的软件组件,其中所述软件组件安装在分布式处理系统的计算节点上; 在所述软件组件中的至少一个中接收表示对所述硬件组件的输入的一个或多个消息; 用软件组件模拟硬件组件的操作,从而生成表示硬件组件的输出的软件组件的输出; 以及从所述软件组件向至少一个其他软件组件发送表示所述硬件组件的输出的一个或多个消息。

    Pacing network traffic among a plurality of compute nodes connected using a data communications network
    37.
    发明授权
    Pacing network traffic among a plurality of compute nodes connected using a data communications network 有权
    在使用数据通信网络连接的多个计算节点之间调度网络流量

    公开(公告)号:US08140704B2

    公开(公告)日:2012-03-20

    申请号:US12166748

    申请日:2008-07-02

    IPC分类号: G06F15/16 H04L1/00

    CPC分类号: H04L47/10 H04L47/283

    摘要: Methods, apparatus, and products are disclosed for pacing network traffic among a plurality of compute nodes connected using a data communications network. The network has a plurality of network regions, and the plurality of compute nodes are distributed among these network regions. Pacing network traffic among a plurality of compute nodes connected using a data communications network includes: identifying, by a compute node for each region of the network, a roundtrip time delay for communicating with at least one of the compute nodes in that region; determining, by the compute node for each region, a pacing algorithm for that region in dependence upon the roundtrip time delay for that region; and transmitting, by the compute node, network packets to at least one of the compute nodes in at least one of the network regions in dependence upon the pacing algorithm for that region.

    摘要翻译: 公开了用于在使用数据通信网络连接的多个计算节点之间起搏网络业务的方法,装置和产品。 网络具有多个网络区域,并且多个计算节点分布在这些网络区域中。 在使用数据通信网络连接的多个计算节点之间起搏网络流量包括:由计算节点针对网络的每个区域识别用于与该区域中的至少一个计算节点进行通信的往返时间延迟; 根据所述区域的往返时间延迟,由所述计算节点为每个区域确定所述区域的起搏算法; 以及根据该区域的起搏算法,由计算节点将网络分组发送到至少一个网络区域中的至少一个计算节点。

    Opportunistic queueing injection strategy for network load balancing
    38.
    发明授权
    Opportunistic queueing injection strategy for network load balancing 有权
    用于网络负载平衡的机会排队注入策略

    公开(公告)号:US07944842B2

    公开(公告)日:2011-05-17

    申请号:US11738034

    申请日:2007-04-20

    IPC分类号: H04L12/26

    摘要: Embodiments of the invention include a method, system, and article of manufacture that provide opportunistic queuing injection strategy used for data communication between nodes of a parallel computer system. A message may be encapsulated into a set of data packets. When the packets are sent, an opportunistic injection queue may be configured to transmit them to multiple hardware injection ports. This approach allows for complete network link saturation. In a parallel system with network links in multiple dimensions, sending message packets using more than one dimension may substantially increase network throughput.

    摘要翻译: 本发明的实施例包括提供用于并行计算机系统的节点之间的数据通信的机会排队注入策略的方法,系统和制品。 消息可以被封装到一组数据分组中。 当发送数据包时,可以配置机会性注入队列将其发送到多个硬件注入端口。 这种方法允许完整的网络链路饱和。 在具有多个维度的网络链路的并行系统中,使用多个维度发送消息分组可以显着增加网络吞吐量。

    Profiling An Application For Power Consumption During Execution On A Compute Node
    39.
    发明申请
    Profiling An Application For Power Consumption During Execution On A Compute Node 有权
    在计算节点上分析执行期间的功耗应用程序

    公开(公告)号:US20100005326A1

    公开(公告)日:2010-01-07

    申请号:US12167302

    申请日:2008-07-03

    IPC分类号: G06F1/32

    摘要: Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

    摘要翻译: 公开了方法,装置和产品,用于在计算节点执行期间对用于功耗的应用进行分析,所述计算节点包括:在计算节点上接收用于执行的应用; 识别所述计算节点的硬件功耗简档,所述硬件功耗简档在执行各种处理操作期间指定计算节点硬件的功耗; 根据应用和计算节点的硬件功耗特性,确定应用的功耗曲线; 并报告应用程序的功耗曲线。

    Controlling Data Transfers from an Origin Compute Node to a Target Compute Node
    40.
    发明申请
    Controlling Data Transfers from an Origin Compute Node to a Target Compute Node 失效
    控制从原始计算节点到目标计算节点的数据传输

    公开(公告)号:US20080301704A1

    公开(公告)日:2008-12-04

    申请号:US11754765

    申请日:2007-05-29

    IPC分类号: G06F13/14

    CPC分类号: G06F13/387

    摘要: Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication.

    摘要翻译: 公开了用于控制从原始计算节点到目标计算节点的数据传输的方法,装置和产品,其包括:由目标计算节点上的应用消息传递模块从原始计算节点接收到从原始计算节点到 目标计算节点; 以及根据所述指示,通过所述目标计算节点上的所述应用消息传递模块来管理使用系统消息传送模块的一个或多个消息传递原语的数据传送。