专利检索 ap:("Matthias Blumrich" OR "Dong Chen" OR "Alan Gara" OR "Mark Giampapa" OR "Philip Heidelberger" OR "Dirk Hoenicke" OR "Martin Ohmacht" OR "Valentina Salapura" OR "Pavlos Vranas") AND inv:"Alan Gara" 第 2 页

11.

发明授权
Multi-petascale highly efficient parallel supercomputer 有权
标题翻译：多千兆高效并行超级计算机

公开(公告)号：US09081501B2

公开(公告)日：2015-07-14

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/173 , G06F9/06 , G06F15/76

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

12.

发明申请
MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER 有权
标题翻译：多层高效平行超级计算机

公开(公告)号：US20110219208A1

公开(公告)日：2011-09-08

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/76 , G06F9/06

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

13.

发明授权
Method and apparatus of prefetching streams of varying prefetch depth 失效
标题翻译：预取不同预取深度的流的方法和装置

公开(公告)号：US08103832B2

公开(公告)日：2012-01-24

申请号：US11768697

申请日：2007-06-26

申请人： Alan Gara , Martin Ohmacht , Valentina Salapura , Krishnan Sugavanam , Dirk Hoenicke

发明人： Alan Gara , Martin Ohmacht , Valentina Salapura , Krishnan Sugavanam , Dirk Hoenicke

IPC分类号： G06F13/00 , G06F13/28

CPC分类号： G06F12/0862 , G06F12/0897 , G06F2212/6026

摘要： Method and apparatus of prefetching streams of varying prefetch depth dynamically changes the depth of prefetching so that the number of multiple streams as well as the hit rate of a single stream are optimized. The method and apparatus in one aspect monitor a plurality of load requests from a processing unit for data in a prefetch buffer, determine an access pattern associated with the plurality of load requests and adjust a prefetch depth according to the access pattern.

摘要翻译： 预取各种预取深度的流的预取方法和装置动态地改变预取的深度，使得多个流的数量以及单个流的命中率被优化。一方面的方法和装置监视来自预处理缓冲器中的数据的处理单元的多个负载请求，确定与多个负载请求相关联的访问模式，并根据访问模式调整预取深度。

14.

发明授权
Methods and apparatus using commutative error detection values for fault isolation in multiple node computers 失效
标题翻译：使用多节点计算机故障隔离交换误差检测值的方法和装置

公开(公告)号：US07383490B2

公开(公告)日：2008-06-03

申请号：US11106069

申请日：2005-04-14

申请人： Gheorghe Almasi , Matthias Augustin Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Sarabjeet Singh , Burkhard D. Steinmacher-Burow , Todd Takken , Pavlos Vranas

发明人： Gheorghe Almasi , Matthias Augustin Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Sarabjeet Singh , Burkhard D. Steinmacher-Burow , Todd Takken , Pavlos Vranas

IPC分类号： G06F11/00 , H03M13/00

CPC分类号： G06F11/1633

摘要： Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for—example, checksums—to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.

摘要翻译： 方法和设备使用交换性错误检测值（例如，校验和）识别和隔离故障节点，在多个节点计算系统中执行故障隔离。当与计算机程序的可再现部分相关联的信息被节点注入到网络中时，计算交换性错误检测值。每隔一段时间，与多节点计算机系统相关联的节点故障检测装置检索与节点相关联的交换错误检测值并将其存储在存储器中。当多节点计算机系统再次执行计算机程序时，创建新的交换错误检测值并将其存储在存储器中。节点故障检测装置通过比较与来自应用程序的不同运行的特定节点生成的应用程序的可再现部分相关联的交换错误检测值来识别故障节点。值的差异表示可能的故障节点。

15.

发明授权
Bad data packet capture device 失效
标题翻译：坏数据包捕获设备

公开(公告)号：US07701846B2

公开(公告)日：2010-04-20

申请号：US11768572

申请日：2007-06-26

申请人： Dong Chen , Alan Gara , Philip Heidelberger , Pavlos Vranas

发明人： Dong Chen , Alan Gara , Philip Heidelberger , Pavlos Vranas

IPC分类号： H04L1/00

CPC分类号： H04L43/0847

摘要： An apparatus and method for capturing data packets for analysis on a network computing system includes a sending node and a receiving node connected by a bi-directional communication link. The sending node sends a data transmission to the receiving node on the bi-directional communication link, and the receiving node receives the data transmission and verifies the data transmission to determine valid data and invalid data and verify retransmissions of invalid data as corresponding valid data. A memory device communicates with the receiving node for storing the invalid data and the corresponding valid data. A computing node communicates with the memory device and receives and performs an analysis of the invalid data and the corresponding valid data received from the memory device.

摘要翻译： 用于捕获数据分组以用于在网络计算系统上进行分析的装置和方法包括通过双向通信链路连接的发送节点和接收节点。发送节点向双向通信链路上的接收节点发送数据传输，接收节点接收数据传输，验证数据传输，确定有效数据和无效数据，并验证无效数据的重传是对应的有效数据。存储装置与接收节点进行通信，用于存储无效数据和对应的有效数据。计算节点与存储器件进行通信，并且接收并执行从存储器件接收的无效数据和对应的有效数据的分析。

16.

发明申请
Method and apparatus for re-utilizing partially failed resources as network resources 失效
标题翻译：将部分故障资源重新利用作为网络资源的方法和装置

公开(公告)号：US20070168695A1

公开(公告)日：2007-07-19

申请号：US11335784

申请日：2006-01-19

申请人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Liebsch , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Liebsch , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F11/00

CPC分类号： G06F11/0793 , G06F11/0724

摘要： A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.

摘要翻译： 在大规模并行的超级计算机系统中重新利用部分失败的计算资源的方法和装置。在优选实施例中，计算节点包括可以单独使能的多个时钟域。当检测到计算节点中的错误，并且故障不在网络通信块中时，时钟使能电路仅允许网络通信块的时钟允许部分失败的计算节点被重新利用为网络资源。然后，计算机系统可以继续操作，性能略有降低，从而提高性能和可察觉的整体可靠性。

17.

发明授权
Embedding global barrier and collective in torus network with each node combining input from receivers according to class map for output to senders 有权
标题翻译：在环网中嵌入全局屏障和集体，每个节点根据类映射将接收器的输入组合到输出到发送器

公开(公告)号：US08521990B2

公开(公告)日：2013-08-27

申请号：US12723277

申请日：2010-03-12

申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Philip Heidelberger , Robert M. Senger , Valentina Salapura , Burkhard Steinmacher-Burow , Yutaka Sugawara , Todd E. Takken

发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Philip Heidelberger , Robert M. Senger , Valentina Salapura , Burkhard Steinmacher-Burow , Yutaka Sugawara , Todd E. Takken

IPC分类号： G06F15/16

CPC分类号： G06F9/30021 , G06F9/3001 , G06F9/30018 , G06F9/30145 , G06F11/3024 , G06F11/3409 , G06F11/348 , G06F15/17362 , G06F15/17381 , G06F15/17393 , G06F2201/88 , H04L67/10

摘要： Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.

摘要翻译： 本发明的实施例提供了一种用于在被组织为环面网络的并行计算机系统中嵌入全局屏障和全局中断网络的方法，系统和计算机程序产品。计算机系统包括多个节点。在一个实施例中，该方法包括从节点的一组接收器中获取输入，将来自接收器的输入划分为多个类，组合每个类的输入以获得结果，并将所述结果发送到一组的节点的发送者。本发明的实施例提供了一种用于将集体网络嵌入组织为环面网络的并行计算机系统中的方法，系统和计算机程序产品。在一个实施例中，该方法包括向环形网络添加集中逻辑以在树结构中的至少一组节点之间路由消息。

18.

发明申请
COLLECTIVE NETWORK FOR COMPUTER STRUCTURES 有权
标题翻译：电脑结构的集体网络

公开(公告)号：US20110219280A1

公开(公告)日：2011-09-08

申请号：US13101566

申请日：2011-05-05

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： H03M13/09 , H04L1/08 , G06F11/10 , G06F11/14

CPC分类号： H04L1/08 , G06F9/46 , G06F11/08 , G06F11/1423 , H03M13/09 , H04L1/0061 , H04L1/1607 , H04L1/1867 , H04L2001/0093 , H04L2001/0097

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。包括通过链路互连网络节点的路由器设备，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

19.

发明授权
Collective network for computer structures 有权
标题翻译：计算机结构集体网络

公开(公告)号：US08626957B2

公开(公告)日：2014-01-07

申请号：US13101566

申请日：2011-05-05

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： G06F15/16

CPC分类号： H04L1/08 , G06F9/46 , G06F11/08 , G06F11/1423 , H03M13/09 , H04L1/0061 , H04L1/1607 , H04L1/1867 , H04L2001/0093 , H04L2001/0097

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。包括通过链路互连网络节点的路由器设备，以便于在虚拟网络的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

20.

发明授权
Collective network for computer structures 有权
标题翻译：计算机结构集体网络

公开(公告)号：US08001280B2

公开(公告)日：2011-08-16

申请号：US11572372

申请日：2005-07-18

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： G06F15/16

CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类