Multi-input and binary reproducible, high bandwidth floating point adder in a collective network
    32.
    发明授权
    Multi-input and binary reproducible, high bandwidth floating point adder in a collective network 有权
    集成网络中的多输入和二进制可重复的高带宽浮点加法器

    公开(公告)号:US08977669B2

    公开(公告)日:2015-03-10

    申请号:US12684776

    申请日:2010-01-08

    IPC分类号: G06F7/38 G06F9/30 G06F9/38

    摘要: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.

    摘要翻译: 为了在并行计算系统中添加浮点数,集体逻辑器件从计算节点接收浮点数。 集体逻辑器件将浮点数转换为整数。 集体逻辑器件添加整数并产生整数的求和。 集体逻辑设备将求和转换为浮点数。 集体逻辑设备执行接收,转换浮点数,加法,生成和一次转换求和。 一次通过表示计算节点仅向集体逻辑设备发送一次输入,并从集体逻辑设备接收一次输出。

    DMA engine for repeating communication patterns
    34.
    发明授权
    DMA engine for repeating communication patterns 失效
    用于重复通信模式的DMA引擎

    公开(公告)号:US07802025B2

    公开(公告)日:2010-09-21

    申请号:US11768795

    申请日:2007-06-26

    IPC分类号: G06F13/28

    CPC分类号: G06F15/163

    摘要: A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.

    摘要翻译: 并行计算机系统被构造为互连的计算节点的网络,以操作用于在整个网络上执行通信的全局消息传递应用。 每个计算节点包括具有存储器的一个或多个单独处理器,该存储器运行在每个计算节点处操作的全局消息传递应用的本地实例,以独立于在其他计算节点执行的处理操作来执行本地处理操作。 每个计算节点还包括构造成通过描述多个注入FIFO的注入FIFO元数据与应用交互的DMA引擎,其中每个注入FIFO可以包含任意数量的消息描述符,以便处理具有固定处理开销的消息,而不管消息的数量 描述符包含在注入FIFO中。

    Method and apparatus for re-utilizing partially failed resources as network resources
    35.
    发明申请
    Method and apparatus for re-utilizing partially failed resources as network resources 失效
    将部分故障资源重新利用作为网络资源的方法和装置

    公开(公告)号:US20070168695A1

    公开(公告)日:2007-07-19

    申请号:US11335784

    申请日:2006-01-19

    IPC分类号: G06F11/00

    CPC分类号: G06F11/0793 G06F11/0724

    摘要: A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.

    摘要翻译: 在大规模并行的超级计算机系统中重新利用部分失败的计算资源的方法和装置。 在优选实施例中,计算节点包括可以单独使能的多个时钟域。 当检测到计算节点中的错误,并且故障不在网络通信块中时,时钟使能电路仅允许网络通信块的时钟允许部分失败的计算节点被重新利用为网络资源。 然后,计算机系统可以继续操作,性能略有降低,从而提高性能和可察觉的整体可靠性。

    Multidimensional switch network
    36.
    发明申请
    Multidimensional switch network 失效
    多维交换机网络

    公开(公告)号:US20050195808A1

    公开(公告)日:2005-09-08

    申请号:US10793068

    申请日:2004-03-04

    IPC分类号: H04L12/26

    CPC分类号: H04L49/1576 H04L45/06

    摘要: Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.

    摘要翻译: 公开了多维交换机数据网络,例如由分布式存储器并行计算机使用的,例如应用于生命科学领域的计算。 分布式存储器并行计算系统包括多个并行计算节点和将计算节点连接在一起的消息传递数据网络。 连接计算节点的数据网络包括具有N维的计算节点的多维交换机数据网络和N个维度中的每一个中的计算节点Ln的数量/数组。 每个计算节点包括具有用于N个维度中的每一个的端口的N端口路由元件。 每个N维中的Ln计算节点阵列的每个计算节点通过其路由元素的端口连接到具有Ln端口的Ln端口交叉开关。 公开了具有65,536个计算节点的四维计算系统的几个实施例。

    REMOTE PROCESSING AND MEMORY UTILIZATION
    37.
    发明申请
    REMOTE PROCESSING AND MEMORY UTILIZATION 审中-公开
    远程处理和存储器的使用

    公开(公告)号:US20130290473A1

    公开(公告)日:2013-10-31

    申请号:US13584323

    申请日:2012-08-13

    IPC分类号: G06F15/167

    摘要: According to one embodiment of the present invention, a system for operating memory includes a first node coupled to a second node by a network, the system configured to perform a method including receiving the remote transaction message from the second node in a processing element in the first node via the network, wherein the remote transaction message bypasses a main processor in the first node as it is transmitted to the processing element. In addition, the method includes accessing, by the processing element, data from a location in a memory in the first node based on the remote transaction message, and performing, by the processing element, computations based on the data and the remote transaction message.

    摘要翻译: 根据本发明的一个实施例,一种用于操作存储器的系统包括由网络耦合到第二节点的第一节点,所述系统被配置为执行一种方法,该方法包括从所述第二节点接收来自所述第二节点的处理元件中的所述远程事务消息 第一节点经由网络,其中当所述远程事务消息被传送到所述处理元件时,所述远程事务消息绕过所述第一节点中的主处理器。 此外,该方法包括基于远程事务消息,由处理元件访问来自第一节点中的存储器中的位置的数据,以及由处理元件基于数据和远程事务消息执行计算。

    Deadlock-free class routes for collective communications embedded in a multi-dimensional torus network
    38.
    发明授权
    Deadlock-free class routes for collective communications embedded in a multi-dimensional torus network 失效
    嵌套在多维环面网络中的集体通信的无死锁级路由

    公开(公告)号:US08364844B2

    公开(公告)日:2013-01-29

    申请号:US12697015

    申请日:2010-01-29

    IPC分类号: G06F15/173

    CPC分类号: G06F15/17381 G06F9/30072

    摘要: A computer implemented method and a system for routing data packets in a multi-dimensional computer network. The method comprises routing a data packet among nodes along one dimension towards a root node, each node having input and output communication links, said root node not having any outgoing uplinks, and determining at each node if the data packet has reached a predefined coordinate for the dimension or an edge of the subrectangle for the dimension, and if the data packet has reached the predefined coordinate for the dimension or the edge of the subrectangle for the dimension, determining if the data packet has reached the root node, and if the data packet has not reached the root node, routing the data packet among nodes along another dimension towards the root node.

    摘要翻译: 一种用于在多维计算机网络中路由数据分组的计算机实现的方法和系统。 该方法包括沿着一个维度的节点之间的数据分组路由到根节点,每个节点具有输入和输出通信链路,所述根节点不具有任何输出上行链路,并且在每个节点处确定数据分组是否已经达到预定义的坐标 尺寸或子尺寸的边缘,以及如果数据分组已达到尺寸的维度或边缘的尺寸的预定义坐标,则确定数据分组是否已到达根节点,并且如果数据分组 数据包尚未到达根节点,将数据包沿着另一个维度的节点路由到根节点。

    Support for non-locking parallel reception of packets belonging to a single memory reception FIFO
    39.
    发明授权
    Support for non-locking parallel reception of packets belonging to a single memory reception FIFO 有权
    支持非锁定并行接收属于单个存储器接收FIFO的数据包

    公开(公告)号:US08086766B2

    公开(公告)日:2011-12-27

    申请号:US12688747

    申请日:2010-01-15

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.

    摘要翻译: 一种并行计算系统中分布式并行消息传递的方法和装置。 多个DMA引擎单元被配置在多处理器系统中以并行操作,一个DMA引擎单元用于将在网络接收队列处接收的当前分组传送到存储器的存储器FIFO(rmFIFO)区域中的存储单元。 控制单元实现逻辑以确定目的地为该rmFIFO的任何先前接收到的分组是否仍处于由多个的另一DMA引擎单元存储在相关联的存储器中的过程中,并且防止一个DMA引擎单元指示完成存储 在接收存储器FIFO(rmFIFO)中的当前接收的分组直到所有先前接收到的该rmFIFO的分组被其它DMA引擎单元完全存储。 因此,提供了非锁定支持,使得去往单个rmFIFO的多个分组被传送并存储在存储器中的预定位置。

    ATOMICITY: A MULTI-PRONGED APPROACH
    40.
    发明申请
    ATOMICITY: A MULTI-PRONGED APPROACH 审中-公开
    原理:多方面的方法

    公开(公告)号:US20110219215A1

    公开(公告)日:2011-09-08

    申请号:US13008546

    申请日:2011-01-18

    IPC分类号: G06F9/30

    CPC分类号: G06F9/524 G06F12/08

    摘要: In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.

    摘要翻译: 在具有推测性执行的多处理器系统中,可以以几种方式逼近原子性。 一种方法是具有实现多种功能并保证完成的原子指令。 另一种方法是将代码块分组成一起成功或失败。 系统可以包含多种这样的方法。 在实施多种方法时,系统可以优先考虑其他方法。 当通过高速缓冲存储器中的目录查找完成冲突检测时,原子指令和原子性相关操作可以在该高速缓冲存储器中的高速缓存数据阵列访问流水线中实现。 该实现可以包括用于在原子指令内实现多个功能并且还用于级联原子指令的流水线的反馈。