摘要:
A parallel computer system includes a plurality of compute nodes. Each of the compute nodes includes at least one processor, at least one memory, and a direct memory address engine coupled to the at least one processor and the at least one memory. The system also includes a network interconnecting the plurality of compute nodes. The network operates a global message-passing application for performing communications across the network. Local instances of the global message-passing application operate at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes. The direct memory address engines are configured to interact with the local instances of the global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of the memories. The local instances of the global message passing application are configured to record, in the injection FIFO in the corresponding one of the memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program. The local instances of the global message passing application are configured to replay the message descriptors during a subsequent iteration of the executing application program.
摘要:
The density of components in integrated circuits (ICs) is increasing with time. The density of heat generated by the components is similarly increasing. Maintaining the temperature of the components at reliable operating levels requires increased thermal transfer rates from the components to the IC package exterior. Dielectric materials used in interconnect regions have lower thermal conductivity than silicon dioxide. This invention comprises a heat pipe located in the interconnect region of an IC to transfer heat generated by components in the IC substrate to metal plugs located on the top surface of the IC, where the heat is easily conducted to the exterior of the IC package. Refinements such as a wicking liner or reticulated inner surface will increase the thermal transfer efficiency of the heat pipe. Strengthening elements in the interior of the heat pipe will provide robustness to mechanical stress during IC manufacture.
摘要:
A method of forming a semiconductor device including forming a low-k dielectric material over a substrate, depositing a liner on a portion of the low-k dielectric material, and exposing the liner to a plasma. The method also includes depositing a layer over the liner.
摘要:
Methods, compute nodes, and computer program products are provided for direct memory access (‘DMA’) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (‘FIFO’) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
Methods, apparatus, and products are disclosed for replenishing data descriptors in a Direct Memory Access (‘DMA’) injection first-in-first-out (‘FIFO’) buffer that include: determining, by a messaging module on an origin compute node, whether a number of data descriptors in a DMA injection FIFO buffer exceeds a predetermined threshold, each data descriptor specifying an application message for transmission to a target compute node; queuing, by the messaging module, a plurality of new data descriptors in a pending descriptor queue if the number of the data descriptors in the DMA injection FIFO buffer exceeds the predetermined threshold; establishing, by the messaging module, interrupt criteria that specify when to replenish the injection FIFO buffer with the plurality of new data descriptors in the pending descriptor queue; and injecting, by the messaging module, the plurality of new data descriptors into the injection FIFO buffer in dependence upon the interrupt criteria.
摘要:
A method of forming a semiconductor device including forming a low-k dielectric material over a substrate, depositing a liner on a portion of the low-k dielectric material, and exposing the liner to a plasma. The method also includes depositing a layer over the liner.
摘要:
A printed circuit board assembly utilizing an elevated track to support signal lines between components is disclosed. The track rests on a plurality of vertical supports, placed amid the components, such that the signal lines can be routed after the components are configured on the board. The vertical supports can be installed at grounding holes already present on the printed circuit board assembly. The track is sufficiently rigid to support bundles of signal lines over long spans between vertical supports. The track can be constructed of the same material as the board, to provide the same ESD and conductivity characteristics as the board, as well as ensure that the track does not contribute to the EMI signature of the board.
摘要:
A method (10) of forming a MIM (metal insulator metal) capacitor is disclosed whereby adverse affects associated with copper diffusion are mitigated even as the capacitor is scaled down. A layer of bottom electrode/copper diffusion barrier material (136) is formed (16) within an aperture (128) wherein the capacitor (100) is to be defined. The bottom electrode layer (136) is formed via a directional process so that a horizontal aspect (138) of the layer (136) is formed over a metal (110) at a bottom of the aperture (128) to a thickness (142) that is greater than a thickness (144) of a sidewall aspect (148) of the layer (136) formed upon sidewalls (132) of the aperture (128). Accordingly, the thinner sidewall aspects (148) are removed during an etching act (18) while some of the thicker horizontal aspect (138) remains. A layer of capacitor dielectric material (150) is then conformally formed (20) into the aperture 128 and over the horizontal aspect (138). A layer of top electrode material (152) is then conformally formed (22) over the layer of capacitor dielectric material (150) to complete the capacitor stack (154).
摘要:
Methods and arrangements for performing synchronized collective operations. Communication calls are accepted from at least two distinct processor groups. Edge disjoint spanning paths are created over a collective comprising the processor groups, and the spanning paths are assigned to the processor groups to facilitate communication within each processor group.