专利检索 ap:("Wayne M. Barrett" OR "Dong Chen" OR "Paul W. Coteus" OR "Alan G. Gara" OR "Rory Jackson" OR "Gerard V. Kopcsay" OR "Ben J. Nathanson" OR "Pavlos M. Vranas") AND inv:"Dong Chen" 第 1 页

1.

发明授权
Data capture technique for high speed signaling 有权
标题翻译：高速信号数据采集技术

公开(公告)号：US07817585B2

公开(公告)日：2010-10-19

申请号：US12191893

申请日：2008-08-14

申请人： Wayne M. Barrett , Dong Chen , Paul W. Coteus , Alan G. Gara , Rory Jackson , Gerard V. Kopcsay , Ben J. Nathanson , Pavlos M. Vranas

发明人： Wayne M. Barrett , Dong Chen , Paul W. Coteus , Alan G. Gara , Rory Jackson , Gerard V. Kopcsay , Ben J. Nathanson , Pavlos M. Vranas

IPC分类号： H04L5/14

CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338

摘要： A data capture technique for high speed signaling to allow for optimal sampling of an asynchronous data stream. This technique allows for extremely high data rates and does not require that a clock be sent with the data as is done in source synchronous systems. The present invention also provides a hardware mechanism for automatically adjusting transmission delays for optimal two-bit simultaneous bi-directional (SiBiDi) signaling.

摘要翻译： 用于高速信令的数据捕获技术，以允许异步数据流的最佳采样。这种技术允许极高的数据速率，并且不要求在源同步系统中进行数据发送时钟。本发明还提供了用于自动调整用于最佳两比特双向（SiBiDi）信令的传输延迟的硬件机制。

2.

发明授权
Global tree network for computing structures enabling global processing operations 失效
标题翻译：用于计算结构的全局树网络，实现全球处理操作

公开(公告)号：US07650434B2

公开(公告)日：2010-01-19

申请号：US10469000

申请日：2002-02-25

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F15/16

CPC分类号： G06F15/17337

摘要： A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.

摘要翻译： 一种用于根据树网络结构互连的处理节点之间实现高速，低延迟的全局树网络通信的系统和方法。全局树网络使得能够在具有多个互连的处理节点的计算机结构中执行并行算法操作期间执行集合缩减操作。包括通过链路互连树节点的路由器设备，以便于在虚拟树和子树结构的节点处执行低延迟全局处理操作。执行的全局操作包括以下一个或多个：从根节点向下游到虚拟树的叶节点的广播操作，从叶节点向上到叶节点到虚拟树中的根节点的减少操作，以及从任何节点到根节点。全局树网络可配置为以异步或同步方式提供全局屏障和中断功能，并且在物理和逻辑上可分区。

3.

发明授权
Class network routing 失效
标题翻译：类网络路由

公开(公告)号：US07587516B2

公开(公告)日：2009-09-08

申请号：US10468999

申请日：2002-02-25

申请人： Gyan Bhanot , Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Gyan Bhanot , Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F15/173 , H04L12/66 , H04L12/50

CPC分类号： H04L45/16 , H04L45/06

摘要： Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.

摘要翻译： 在诸如包括在其节点处的多个并行计算处理器的计算机网络的网络中实现类网络路由。类网络路由允许计算处理器将消息广播到计算机网络中的其他计算处理器的范围（一个或多个），例如列或行中的处理器。通常这种类型的操作需要单独的消息发送到每个处理器。根据本发明的类网络路由，单个消息是足够的，这通常减少了网络中的消息总数以及进行广播的延迟。类网络路由也适用于具有硬件类功能（组播）能力的分布式存储并行超级计算机上的密集矩阵求逆算法。这是通过利用密集矩阵反演的通信模式可以通过硬件类功能来实现的，这导致更快的执行时间。

4.

发明授权
Low latency memory access and synchronization 失效
标题翻译：低延迟内存访问和同步

公开(公告)号：US07174434B2

公开(公告)日：2007-02-06

申请号：US10468994

申请日：2002-02-25

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F12/12

CPC分类号： G06F9/52

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的每个处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

5.

发明授权
Method for prefetching non-contiguous data structures 失效
标题翻译：预取非连续数据结构的方法

公开(公告)号：US07529895B2

公开(公告)日：2009-05-05

申请号：US11617276

申请日：2006-12-28

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F13/28

CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的每个处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单完善。存储器线被重新定义，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定哪个存储器行被提供而不是一些其它预测算法。这使得硬件能够有效地预处理不连续但重复的存储器访问模式。

6.

发明授权
Optimized scalable network switch 失效
标题翻译：优化可扩展网络交换机

公开(公告)号：US07305487B2

公开(公告)日：2007-12-04

申请号：US10469001

申请日：2002-02-25

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F15/173

CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338

摘要： In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.

摘要翻译： 在具有以m多维配置的多个节点的大规模并行计算系统中，每个节点包括计算设备，提供了用于向分组朝向其目的地节点路由分组的方法，其包括生成2m个多个紧凑比特向量中的至少一个包含从下游节点导出的信息。存储在紧凑向量中的下行信息（诸如链路状态信息和下游缓冲器的丰满度）的多级仲裁过程被用于确定分组传输的优选方向和虚拟信道。优选的方向范围被编码，并且通过检查多个紧凑比特向量来选择虚拟信道。这种动态路由方法消除了路由表的必要性，从而增强了交换机的可扩展性。

7.

发明授权
Low latency memory access and synchronization 失效
标题翻译：低延迟内存访问和同步

公开(公告)号：US07818514B2

公开(公告)日：2010-10-19

申请号：US12196796

申请日：2008-08-22

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F12/06

CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的Bach处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

8.

发明申请
LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
标题翻译：低延迟存储器访问和同步

公开(公告)号：US20080313408A1

公开(公告)日：2008-12-18

申请号：US12196796

申请日：2008-08-22

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard D. Steinmacher-Burow , Todd E. Takken , Pavlos M. Vranas

IPC分类号： G06F12/08

CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的Bach处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

9.

发明申请
NOVEL MASSIVELY PARALLEL SUPERCOMPUTER 有权
标题翻译：新的大型并行超级计算机

公开(公告)号：US20120311299A1

公开(公告)日：2012-12-06

申请号：US13566024

申请日：2012-08-03

申请人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidlberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken

发明人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidlberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken

IPC分类号： G06F15/80

CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338

摘要： A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

摘要翻译： 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构，即每个处理节点包括单个专用集成电路（ASIC）。在每个ASIC节点内是多个处理元件，每个处理元件由中央处理单元（CPU）和多个浮点处理器组成，以实现计算性能，封装密度，低成本以及功率和冷却要求的最佳平衡。单个节点内的多个处理器单独或同时工作在要解决的特定算法所要求的计算或通信的任何组合上。片上系统ASIC节点通过多个独立网络进行互连，从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。多个网络包括用于并行算法消息传递的三个高速网络，包括Torus，全局树和提供全局障碍和通知功能的全球异步网络。

10.

发明授权
Massively parallel supercomputer 有权
标题翻译：大型并行超级计算机

公开(公告)号：US08250133B2

公开(公告)日：2012-08-21

申请号：US12492799

申请日：2009-06-26

申请人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken

发明人： Matthias A. Blumrich , Dong Chen , George L. Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Gerard V. Kopcsay , Lawrence S. Mok , Todd E. Takken

IPC分类号： G06F15/16

CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338

摘要： A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System- On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

摘要翻译： 数百个teraOPS级别的新型大规模并行超级计算机包括基于片上系统技术的节点架构，即每个处理节点包括单个专用集成电路（ASIC）。在每个ASIC节点内是多个处理元件，每个处理元件由中央处理单元（CPU）和多个浮点处理器组成，以实现计算性能，封装密度，低成本以及功率和冷却要求的最佳平衡。单个节点内的多个处理器单独或同时工作在要解决的特定算法所要求的计算或通信的任何组合上。片上系统ASIC节点通过多个独立网络互连，从而最大限度地最大限度地提高了分组通信吞吐量并最大限度地减少了延迟。多个网络包括用于并行算法消息传递的三个高速网络，包括Torus，全局树和提供全局障碍和通知功能的全球异步网络。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类