摘要:
Methods, compute nodes, and computer program products are provided for heuristic status polling of a component in a computing system. Embodiments include receiving, by a polling module from a requesting application, a status request requesting status of a component; determining, by the polling module, whether an activity history for the component satisfies heuristic polling criteria; polling, by the polling module, the component for status if the activity history for the component satisfies the heuristic polling criteria; and not polling, by the polling module, the component for status if the activity history for the component does not satisfy the heuristic criteria.
摘要:
Methods, compute nodes, and computer program products are provided for heuristic status polling of a component in a computing system. Embodiments include receiving, by a polling module from a requesting application, a status request requesting status of a component; determining, by the polling module, whether an activity history for the component satisfies heuristic polling criteria; polling, by the polling module, the component for status if the activity history for the component satisfies the heuristic polling criteria; and not polling, by the polling module, the component for status if the activity history for the component does not satisfy the heuristic criteria.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
摘要:
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
摘要:
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
摘要:
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
摘要:
The present invention provides a system and method for extracting elements from distributed arrays on a parallel processing system. The system includes a module that populates a result array with globally largest elements from input arrays, a module that generates a partition element, a module that counts the number of local elements greater than the partition element, and a module that determines the globally largest elements. The method for extracting elements from distributed arrays on a parallel processing system includes populating a result array with globally largest elements from input arrays, generating a partition element, counting the number of local elements greater than the partition and determining the globally largest elements.
摘要:
Routing data communications packets in a parallel computer that includes compute nodes organized for collective operations. Each compute node including an operating system kernel and a system-level messaging module that is a module of automated computing machinery that exposes a messaging interface to applications. Each compute node including a routing table that specifies, for each of a multiplicity of route identifiers, a data communications path through the compute node. Including to carry out the steps of: receiving in a compute node a data communications packet that includes a route identifier value; retrieving from the routing table a specification of a data communications path through the compute node; and routing, by the compute node, the data communications packet according to the data communications path identified by the compute node's routing table entry for the data communications packet's route identifier value.