摘要:
Methods, apparatus, and products are disclosed for pacing network traffic among a plurality of compute nodes connected using a data communications network. The network has a plurality of network regions, and the plurality of compute nodes are distributed among these network regions. Pacing network traffic among a plurality of compute nodes connected using a data communications network includes: identifying, by a compute node for each region of the network, a roundtrip time delay for communicating with at least one of the compute nodes in that region; determining, by the compute node for each region, a pacing algorithm for that region in dependence upon the roundtrip time delay for that region; and transmitting, by the compute node, network packets to at least one of the compute nodes in at least one of the network regions in dependence upon the pacing algorithm for that region.
摘要:
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
摘要:
Embodiments of the invention include a method, system, and article of manufacture that provide opportunistic queuing injection strategy used for data communication between nodes of a parallel computer system. A message may be encapsulated into a set of data packets. When the packets are sent, an opportunistic injection queue may be configured to transmit them to multiple hardware injection ports. This approach allows for complete network link saturation. In a parallel system with network links in multiple dimensions, sending message packets using more than one dimension may substantially increase network throughput.
摘要:
Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
摘要:
Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
摘要:
Methods, parallel computers, and computer program products are disclosed for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root. Embodiments include configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce.
摘要:
Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication.
摘要:
A method, system, and article of manufacture that provide latency hiding, high bandwidth message passing protocols used for data communication between nodes of a parallel computer system are disclosed. A source node transmits a request to send message to a receiving node. Prior to receiving a clear to send message, the sending node continues to send deterministically routed (or fully described) data packets to the receiving node, thereby hiding the latency inherent in the request to send—clear to send message exchange. Once the sending node receives the clear to send message, any remaining portion of the message may be sent using partially described packets which may be routed dynamically, thereby maximizing bandwidth.
摘要:
Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component.
摘要:
Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.