Massively parallel supercomputer
    1.
    发明授权
    Massively parallel supercomputer 有权
    大型并行超级计算机

    公开(公告)号:US08250133B2

    公开(公告)日:2012-08-21

    申请号:US12492799

    申请日:2009-06-26

    IPC分类号: G06F15/16

    摘要: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

    摘要翻译: 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器单独或同时工作在要解决的特定算法所要求的计算或通信的任何组合上。 片上系统ASIC节点通过多个独立网络互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 多个网络包括用于并行算法消息传递的三个高速网络,包括Torus,全局树和提供全局障碍和通知功能的全球异步网络。

    Massively parallel supercomputer
    2.
    发明授权
    Massively parallel supercomputer 有权
    大型并行超级计算机

    公开(公告)号:US07555566B2

    公开(公告)日:2009-06-30

    申请号:US10468993

    申请日:2002-02-25

    IPC分类号: G06F15/16

    摘要: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external connectivity and used for Input/Output, System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the supercomputer in multiple networks for optimizing supercomputing resources.

    摘要翻译: 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即,每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器可以单独使用或同时使用,以在任何时间点解决或执行的特定算法所要求的任何计算或通信组合上工作。 片上系统ASIC节点通过多个独立网络互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 在优选实施例中,多个网络包括用于并行算法消息传递的三个高速网络,包括提供全局障碍和通知功能的环形,全局树和全球异步网络。 这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。 对于特定类别的并行算法或并行计算的部分,该架构具有出色的计算性能,并且可以启用对新类并行算法执行计算。 为外部连接提供附加网络,用于输入/输出,系统管理和配置以及调试和监控功能。 实现中平面和其他硬件设备的特殊节点打包技术有助于在多个网络中划分超级计算机,以优化超级计算资源。

    NOVEL MASSIVELY PARALLEL SUPERCOMPUTER
    3.
    发明申请
    NOVEL MASSIVELY PARALLEL SUPERCOMPUTER 有权
    新的大型并行超级计算机

    公开(公告)号:US20090259713A1

    公开(公告)日:2009-10-15

    申请号:US12492799

    申请日:2009-06-26

    摘要: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular algorithm being solved or executed at any point in time. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external connectivity and used for Input/Output, System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the supercomputer in multiple networks for optimizing supercomputing resources.

    摘要翻译: 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器可以单独使用或同时使用,以在任何时间点解决或执行的特定算法所要求的任何计算或通信组合上工作。 片上系统ASIC节点通过多个独立网络互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 在优选实施例中,多个网络包括用于并行算法消息传递的三个高速网络,包括提供全局障碍和通知功能的环形,全局树和全球异步网络。 这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。 对于特定类别的并行算法或并行计算的部分,该架构具有出色的计算性能,并且可以启用对新类并行算法执行计算。 为外部连接提供附加网络,用于输入/输出,系统管理和配置以及调试和监控功能。 实现中平面和其他硬件设备的特殊节点打包技术有助于在多个网络中划分超级计算机,以优化超级计算资源。

    NOVEL MASSIVELY PARALLEL SUPERCOMPUTER
    4.
    发明申请
    NOVEL MASSIVELY PARALLEL SUPERCOMPUTER 有权
    新的大型并行超级计算机

    公开(公告)号:US20120311299A1

    公开(公告)日:2012-12-06

    申请号:US13566024

    申请日:2012-08-03

    IPC分类号: G06F15/80

    摘要: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

    摘要翻译: 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构,即每个处理节点包括单个专用集成电路(ASIC)。 在每个ASIC节点内是多个处理元件,每个处理元件由中央处理单元(CPU)和多个浮点处理器组成,以实现计算性能,封装密度,低成本以及功率和冷却​​要求的最佳平衡。 单个节点内的多个处理器单独或同时工作在要解决的特定算法所要求的计算或通信的任何组合上。 片上系统ASIC节点通过多个独立网络进行互连,从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。 多个网络包括用于并行算法消息传递的三个高速网络,包括Torus,全局树和提供全局障碍和通知功能的全球异步网络。

    Global tree network for computing structures enabling global processing operations
    9.
    发明授权
    Global tree network for computing structures enabling global processing operations 失效
    用于计算结构的全局树网络,实现全球处理操作

    公开(公告)号:US07650434B2

    公开(公告)日:2010-01-19

    申请号:US10469000

    申请日:2002-02-25

    IPC分类号: G06F15/16

    CPC分类号: G06F15/17337

    摘要: A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.

    摘要翻译: 一种用于根据树网络结构互连的处理节点之间实现高速,低延迟的全局树网络通信的系统和方法。 全局树网络使得能够在具有多个互连的处理节点的计算机结构中执行并行算法操作期间执行集合缩减操作。 包括通过链路互连树节点的路由器设备,以便于在虚拟树和子树结构的节点处执行低延迟全局处理操作。 执行的全局操作包括以下一个或多个:从根节点向下游到虚拟树的叶节点的广播操作,从叶节点向上到叶节点到虚拟树中的根节点的减少操作,以及从任何 节点到根节点。 全局树网络可配置为以异步或同步方式提供全局屏障和中断功能,并且在物理和逻辑上可分区。

    Class network routing
    10.
    发明授权
    Class network routing 失效
    类网络路由

    公开(公告)号:US07587516B2

    公开(公告)日:2009-09-08

    申请号:US10468999

    申请日:2002-02-25

    CPC分类号: H04L45/16 H04L45/06

    摘要: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.

    摘要翻译: 在诸如包括在其节点处的多个并行计算处理器的计算机网络的网络中实现类网络路由。 类网络路由允许计算处理器将消息广播到计算机网络中的其他计算处理器的范围(一个或多个),例如列或行中的处理器。 通常这种类型的操作需要单独的消息发送到每个处理器。 根据本发明的类网络路由,单个消息是足够的,这通常减少了网络中的消息总数以及进行广播的延迟。 类网络路由也适用于具有硬件类功能(组播)能力的分布式存储并行超级计算机上的密集矩阵求逆算法。 这是通过利用密集矩阵反演的通信模式可以通过硬件类功能来实现的,这导致更快的执行时间。