Synchronizing compute node time bases in a parallel computer
    71.
    发明授权
    Synchronizing compute node time bases in a parallel computer 有权
    在并行计算机中同步计算节点时基

    公开(公告)号:US08924763B2

    公开(公告)日:2014-12-30

    申请号:US13327107

    申请日:2011-12-15

    IPC分类号: G06F1/12

    CPC分类号: G06F1/12 H04L12/413

    摘要: Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.

    摘要翻译: 在并行计算机中同步时基,其包括为树网络中的数据通信而组织的计算节点,其中一个计算节点被指定为根,并且对于每个计算节点,计算从根到计算节点的数据传输等待时间; 将线程配置为脉冲服务员; 初始化唤醒单元; 并执行局部屏障操作; 在每个节点完成局部屏障操作时,由所有计算节点进入全局屏障操作; 在所有节点进入全局屏障操作之后,向所有计算节点发送脉冲信号; 并且对于每个计算节点在接收到脉冲信号时:由唤醒单元唤醒脉冲服务员; 为计算节点设置等于根节点和计算节点之间的数据传输延迟的时基; 并退出全球屏障操作。

    Method and apparatus for efficiently tracking queue entries relative to a timestamp
    73.
    发明授权
    Method and apparatus for efficiently tracking queue entries relative to a timestamp 失效
    相对于时间戳有效跟踪队列条目的方法和装置

    公开(公告)号:US08756350B2

    公开(公告)日:2014-06-17

    申请号:US11768800

    申请日:2007-06-26

    IPC分类号: G06F3/00 G06F5/00

    CPC分类号: G06F12/0835 G06F12/0831

    摘要: An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

    摘要翻译: 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。 该装置包括相干逻辑单元,每个单元具有多个队列结构,每个队列结构与在系统中传输的事件信号的相应发送者相关联。 与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队,并且计数器跟踪队列结构中剩余入队的多个相干事件信号,并且从接收到时间戳信号起出队。 计数器机构产生一个输出信号,指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。 在一个实施例中,时间戳信号在存储器同步操作的开始被断言,并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。 然后可以将该信号用作存储器同步操作的完成条件的一部分。

    T-STAR INTERCONNECTION NETWORK TOPOLOGY
    74.
    发明申请
    T-STAR INTERCONNECTION NETWORK TOPOLOGY 有权
    T-STAR互联网络拓扑

    公开(公告)号:US20140044006A1

    公开(公告)日:2014-02-13

    申请号:US13569789

    申请日:2012-08-08

    IPC分类号: H04L12/28

    摘要: According to one embodiment of the present invention, a system for network communication includes an M dimensional grid of node groups, each node group including N nodes, wherein M is greater than or equal to one and N is greater than one and each node comprises a router and intra-group links directly connecting each node in each node group to every other node in the node group. In addition, the system includes inter-group links directly connecting each node in each node group to a node in each neighboring node group in the M dimensional grid.

    摘要翻译: 根据本发明的一个实施例,一种用于网络通信的系统包括节点组的M维网格,每个节点组包括N个节点,其中M大于或等于1,并且N大于1,并且每个节点包括 路由器和组内链路,将每个节点组中的每个节点直接连接到节点组中的每个其他节点。 此外,该系统包括将每个节点组中的每个节点直接连接到M维网格中的每个相邻节点组中的节点的组间链路。

    Synchronizing Compute Node Time Bases In A Parallel Computer
    76.
    发明申请
    Synchronizing Compute Node Time Bases In A Parallel Computer 有权
    在并行计算机中同步计算节点时基

    公开(公告)号:US20130159760A1

    公开(公告)日:2013-06-20

    申请号:US13327107

    申请日:2011-12-15

    IPC分类号: G06F1/12

    CPC分类号: G06F1/12 H04L12/413

    摘要: Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.

    摘要翻译: 在并行计算机中同步时基,其包括为树网络中的数据通信而组织的计算节点,其中一个计算节点被指定为根,并且对于每个计算节点,计算从根到计算节点的数据传输等待时间; 将线程配置为脉冲服务员; 初始化唤醒单元; 并执行局部屏障操作; 在每个节点完成局部屏障操作时,由所有计算节点进入全局屏障操作; 在所有节点进入全局屏障操作之后,向所有计算节点发送脉冲信号; 并且对于每个计算节点在接收到脉冲信号时:由唤醒单元唤醒脉冲服务员; 为计算节点设置等于根节点和计算节点之间的数据传输延迟的时基; 并退出全球屏障操作。

    Combined group ECC protection and subgroup parity protection
    77.
    发明授权
    Combined group ECC protection and subgroup parity protection 有权
    组合组ECC保护和子组奇偶校验保护

    公开(公告)号:US08468416B2

    公开(公告)日:2013-06-18

    申请号:US11768527

    申请日:2007-06-26

    IPC分类号: H03M13/00

    摘要: A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.

    摘要翻译: 公开了用于为给定的n位组提供组合的错误代码保护和子组奇偶校验保护的方法和系统。 该方法包括以下步骤:识别用于所述错误保护的冗余位的数量m; 并且构造矩阵P,其中将所述给定的n个比特组与P相乘产生m个冗余纠错码(ECC)保护比特,并且两列P为所述给定组n比特的子组提供奇偶校验保护。 在本发明的优选实施例中,矩阵P是通过产生具有三个或更多个奇数个元素的m位宽向量的排列而构成的,其中值为1的元素和其他元素的值为零; 并将所述向量分配给矩阵P的行。

    Massively parallel supercomputer
    78.
    发明授权
    Massively parallel supercomputer 有权
    大型并行超级计算机

    公开(公告)号:US08250133B2

    公开(公告)日:2012-08-21

    申请号:US12492799

    申请日:2009-06-26

    IPC分类号: G06F15/16

    摘要: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

    摘要翻译: 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器单独或同时工作在要解决的特定算法所要求的计算或通信的任何组合上。 片上系统ASIC节点通过多个独立网络互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 多个网络包括用于并行算法消息传递的三个高速网络,包括Torus,全局树和提供全局障碍和通知功能的全球异步网络。

    MULTI-INPUT AND BINARY REPRODUCIBLE, HIGH BANDWIDTH FLOATING POINT ADDER IN A COLLECTIVE NETWORK
    80.
    发明申请
    MULTI-INPUT AND BINARY REPRODUCIBLE, HIGH BANDWIDTH FLOATING POINT ADDER IN A COLLECTIVE NETWORK 有权
    多输入和二进制可复现,集合网络中的高带宽浮点添加

    公开(公告)号:US20110173421A1

    公开(公告)日:2011-07-14

    申请号:US12684776

    申请日:2010-01-08

    IPC分类号: G06F9/302

    摘要: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.

    摘要翻译: 为了在并行计算系统中添加浮点数,集体逻辑器件从计算节点接收浮点数。 集体逻辑器件将浮点数转换为整数。 集体逻辑器件添加整数并产生整数的求和。 集体逻辑设备将求和转换为浮点数。 集体逻辑设备执行接收,转换浮点数,加法,生成和一次转换求和。 一次通过表示计算节点仅向集体逻辑设备发送一次输入,并从集体逻辑设备接收一次输出。