摘要:
In an embodiment, a send thread receives an identifier that identifies a destination node and a pointer to data. The send thread creates a first send request in response to the receipt of the identifier and the data pointer. The send thread selects a selected channel from among a plurality of channels. The selected channel comprises a selected hand-off queue and an identification of a selected message unit. Each of the channels identifies a different message unit. The selected hand-off queue is randomly accessible. If the selected hand-off queue contains an available entry, the send thread adds the first send request to the selected hand-off queue. If the selected hand-off queue does not contain an available entry, the send thread removes a second send request from the selected hand-off queue and sends the second send request to the selected message unit.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the received store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.
摘要:
A method for route optimization with a dual mobile IPv4 node in an IPv6-only network is provided. The method includes the operations of: receiving a visited IPv6 address from a router when the dual mobile node is connected to the IPv6-only network; updating a home agent with the IPv6 address; deregistering a binding update with a correspondent node via the home agent; updating the correspondent node with an IPv6 address; checking the reachability of packets directly to the correspondent node using its IPv6 address; the mobile node starting sending, to the CN, data packets tunneled in an IPv6 packet once the reachability is verified; and the correspondent node sending tunneled data packets directly to an IPv6 address of the mobile node.
摘要:
The density of components in integrated circuits (ICs) is increasing with time. The density of heat generated by the components is similarly increasing. Maintaining the temperature of the components at reliable operating levels requires increased thermal transfer rates from the components to the IC package exterior. Dielectric materials used in interconnect regions have lower thermal conductivity than silicon dioxide. This invention comprises a heat pipe located in the interconnect region of an IC to transfer heat generated by components in the IC substrate to metal plugs located on the top surface of the IC, where the heat is easily conducted to the exterior of the IC package. Refinements such as a wicking liner or reticulated inner surface will increase the thermal transfer efficiency of the heat pipe. Strengthening elements in the interior of the heat pipe will provide robustness to mechanical stress during IC manufacture.
摘要:
A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.
摘要:
Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the root processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become “virtual root compute nodes” for the purpose of replicating the broadcast message to lower levels of a tree.
摘要:
A device employs damascene layers with a pore sealing liner and includes a semiconductor body. A metal interconnect layer comprising a metal interconnect is formed over the semiconductor body. A dielectric layer is formed over the metal interconnect layer. A conductive trench feature and a conductive via feature are formed in the dielectric layer. A pore sealing liner is formed only along sidewall of the conductive via feature and along sidewalls and bottom surfaces of the conductive trench feature. The pore sealing liner is not substantially present along a bottom surface of the conductive via feature.
摘要:
A method of achieving route optimization (RO) when a dual capable mobile Internet protocol version 6 (MIPv6) mobile node is connected with an IPv4-only network allows RO of packets to traverse a shorter route than the default one through the home agent (HA) using bidirectional tunneling, and leads to better bandwidth utilization. The method of RO with a dual MIPV6 node in an IPV4-only network includes updating the HA with an IPv4 address of the MN and deregistering a binding update (BU) with a corresponding node (CN) via the HA; informing the CN about its IPv4 address and receiving the CN's IPv4 address in reply; checking reachability of the CN in its IPv4 address using an IPv6-in-IPv4 tunnel; and sending and receiving Ipv6 data packets to/from the CN using a v4 tunnel.
摘要:
A method for route optimization with a dual mobile IPv4 node in an IPv6-only network is provided. The method includes the operations of: receiving a visited IPv6 address from a router when the dual mobile node is connected to the IPv6-only network; updating a home agent with the IPv6 address; deregistering a binding update with a correspondent node via the home agent; updating the correspondent node with an IPv6 address; checking the reachability of packets directly to the correspondent node using its IPv6 address; the mobile node starting sending, to the CN, data packets tunneled in an IPv6 packet once the reachability is verified; and the correspondent node sending tunneled data packets directly to an IPv6 address of the mobile node.