Processing communications events in parallel active messaging interface by awakening thread from wait state
    61.
    发明授权
    Processing communications events in parallel active messaging interface by awakening thread from wait state 失效
    通过唤醒线程等待状态来处理通信事件并行活动消息接口

    公开(公告)号:US08566841B2

    公开(公告)日:2013-10-22

    申请号:US12943105

    申请日:2010-11-10

    IPC分类号: G06F15/163

    摘要: Processing data communications events in a parallel active messaging interface (‘PAMI’) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

    摘要翻译: 在并行计算机的并行活动消息接口(“PAMI”)中处理数据通信事件,其包括执行并行应用的计算节点,PAMI包括数据通信端点和端点,用于通过PAMI进行数据通信,并通过 其他数据通信资源,包括通过提前功能确定不存在针对其上下文等待的可操作的数据通信事件,通过提前功能将其执行线程置于等待状态,等待上下文的后续数据通信事件; 响应于上下文的后续数据通信事件的发生,线程从等待状态唤醒; 以及通过提前功能处理现在正在等待上下文的后续数据通信事件。

    Performing a scatterv operation on a hierarchical tree network optimized for collective operations
    62.
    发明授权
    Performing a scatterv operation on a hierarchical tree network optimized for collective operations 失效
    在为集体操作优化的分层树网络上执行分散操作

    公开(公告)号:US08565089B2

    公开(公告)日:2013-10-22

    申请号:US12748594

    申请日:2010-03-29

    CPC分类号: G06F15/17318

    摘要: Performing a scatterv operation on a hierarchical tree network optimized for collective operations including receiving, by the scatterv module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scatterv module installed on the node, the portion of the data for the node; determining, by the scatterv module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scatterv module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.

    摘要翻译: 在对集体操作进行了优化的分层树网络上执行分散操作,包括由节点上安装的scatterv模块从节点上方的最邻近父节点接收具有该节点的至少一部分数据的数据块; 通过安装在节点上的scatterv模块维护节点的数据部分; 通过安装在节点上的scatterv模块来确定数据的任何部分是否用于节点下方的特定最近邻孩子或特定最近邻孩子下面的一个或多个其他节点; 并且如果数据的任何部分用于节点下方的特定最近邻居子节点或特定最邻近子节点下方的一个或多个其他节点,则将由节点上安装的scatterv模块发送到最近邻居子节点的那些部分数据。

    Profiling an application for power consumption during execution on a compute node
    63.
    发明授权
    Profiling an application for power consumption during execution on a compute node 有权
    在计算节点上分析执行期间的功耗应用程序

    公开(公告)号:US08539270B2

    公开(公告)日:2013-09-17

    申请号:US13447501

    申请日:2012-04-16

    IPC分类号: G06F1/26

    摘要: Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

    摘要翻译: 公开了方法,装置和产品,用于在计算节点执行期间对用于功耗的应用进行分析,所述计算节点包括:在计算节点上接收用于执行的应用; 识别所述计算节点的硬件功耗简档,所述硬件功耗简档在执行各种处理操作期间指定计算节点硬件的功耗; 根据应用和计算节点的硬件功耗特性,确定应用的功耗曲线; 并报告应用程序的功耗曲线。

    Performing a deterministic reduction operation in a compute node organized into a branched tree topology
    65.
    发明授权
    Performing a deterministic reduction operation in a compute node organized into a branched tree topology 失效
    在组织成分支树拓扑的计算节点中执行确定性简化操作

    公开(公告)号:US08489859B2

    公开(公告)日:2013-07-16

    申请号:US12790037

    申请日:2010-05-28

    IPC分类号: G06F9/00

    CPC分类号: G06F15/76 G06F15/17318

    摘要: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

    摘要翻译: 在包括计算节点的并行计算机中执行确定性简化操作,每个节点包括计算机处理器和将计算机处理器彼此耦合以用于数据通信的CAU(集体加速单元),包括将处理器和CAU组织成分支树形拓扑 其中CAU是根,处理器是孩子; 从每个处理器以任何顺序接收虚拟贡献数据,其中每个处理器在从根CAU接收到接收确认之前被限制不发送任何其他数据到根CAU; 由根CAU以分支树拓扑结构向处理器发送预定义的顺序,接收虚拟贡献数据的确认; 根据CAU从预定义的顺序从处理器接收处理器对减少操作的贡献数据; 并由根CAU减少处理器的贡献数据。

    Effecting hardware acceleration of broadcast operations in a parallel computer

    公开(公告)号:US08346883B2

    公开(公告)日:2013-01-01

    申请号:US12782791

    申请日:2010-05-19

    IPC分类号: G06F15/167

    CPC分类号: G06F15/17318

    摘要: Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.

    Profiling an application for power consumption during execution on a plurality of compute nodes
    68.
    发明授权
    Profiling an application for power consumption during execution on a plurality of compute nodes 有权
    在执行期间在多个计算节点上分析应用程序的功耗

    公开(公告)号:US08250389B2

    公开(公告)日:2012-08-21

    申请号:US12167302

    申请日:2008-07-03

    IPC分类号: G06F1/32

    摘要: Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.

    摘要翻译: 公开了方法,装置和产品,用于在计算节点执行期间对用于功耗的应用进行分析,所述计算节点包括:在计算节点上接收用于执行的应用; 识别所述计算节点的硬件功耗简档,所述硬件功耗简档在执行各种处理操作期间指定计算节点硬件的功耗; 根据应用和计算节点的硬件功耗特性,确定应用的功耗曲线; 并报告应用程序的功耗曲线。

    Query performance data on parallel computer system having compute nodes
    69.
    发明授权
    Query performance data on parallel computer system having compute nodes 有权
    在具有计算节点的并行计算机系统上查询性能数据

    公开(公告)号:US08250164B2

    公开(公告)日:2012-08-21

    申请号:US12760783

    申请日:2010-04-15

    IPC分类号: G06F15/16

    摘要: Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node.

    摘要翻译: 本发明的实施例提供了一种在大规模并行计算系统上查询性能计数器数据的方法,同时最小化与中断计算机处理器和有限存储器资源相关联的成本。 可以将DMA描述符插入到大规模并行计算系统中的远程计算节点的注入FIFO中。 在执行由DMA描述符描述的DMA操作时,性能计数器数据可以从远程计算节点传送到目的地节点。

    Distributed Hardware Device Simulation
    70.
    发明申请
    Distributed Hardware Device Simulation 有权
    分布式硬件设备仿真

    公开(公告)号:US20120185230A1

    公开(公告)日:2012-07-19

    申请号:US13006696

    申请日:2011-01-14

    IPC分类号: G06F17/50

    摘要: Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component.

    摘要翻译: 分布式硬件设备仿真,包括:识别硬件设备的多个硬件组件; 提供模拟每个硬件组件的功能的软件组件,其中所述软件组件安装在分布式处理系统的计算节点上; 在所述软件组件中的至少一个中接收表示对所述硬件组件的输入的一个或多个消息; 用软件组件模拟硬件组件的操作,从而生成表示硬件组件的输出的软件组件的输出; 以及从所述软件组件向至少一个其他软件组件发送表示所述硬件组件的输出的一个或多个消息。