专利检索 ap:("Dong Chen" OR "Alan Gara" OR "Philip Heidelberger" OR "Thomas Alan Liebsch" OR "Burkhard Steinmacher-Burow" OR "Pavlos Michael Vranas") AND inv:"Dong Chen" 第 1 页

1.

发明授权
Re-utilizing partially failed resources as network resources 失效
标题翻译：重新利用部分失败的资源作为网络资源

公开(公告)号：US07620841B2

公开(公告)日：2009-11-17

申请号：US11335784

申请日：2006-01-19

申请人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Alan Liebsch , Burkhard Steinmacher-Burow , Pavlos Michael Vranas

发明人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Alan Liebsch , Burkhard Steinmacher-Burow , Pavlos Michael Vranas

IPC分类号： G06F11/00

CPC分类号： G06F11/0793 , G06F11/0724

摘要： A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.

摘要翻译： 在大规模并行的超级计算机系统中重新利用部分失败的计算资源的方法和装置。在优选实施例中，计算节点包括可以单独使能的多个时钟域。当检测到计算节点中的错误，并且故障不在网络通信块中时，时钟使能电路仅允许网络通信块的时钟允许部分失败的计算节点被重新利用为网络资源。然后，计算机系统可以继续操作，性能略有降低，从而提高性能和可察觉的整体可靠性。

2.

发明授权
Reproducibility in a multiprocessor system 有权
标题翻译：多处理器系统中的重现性

公开(公告)号：US08595554B2

公开(公告)日：2013-11-26

申请号：US12774475

申请日：2010-05-05

申请人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

发明人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

IPC分类号： G06F11/00

CPC分类号： G06F1/10 , G06F11/2242

摘要： Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

摘要翻译： 如果问题是可重现的，通常会大大帮助解决问题。为了确保多处理器系统的可重复性，提出了以下方面：确定性系统启动状态，单个系统时钟，系统中的时钟相位对齐，全系统同步事件，系统组件的可重复执行，确定性芯片接口，零 - 与系统进行通信，精确地停止系统并扫描系统状态。

3.

发明申请
LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
标题翻译：低延迟存储器访问和同步

公开(公告)号：US20070204112A1

公开(公告)日：2007-08-30

申请号：US11617276

申请日：2006-12-28

申请人： Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

发明人： Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

IPC分类号： G06F12/14

CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的每个处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

4.

发明授权
Multi-petascale highly efficient parallel supercomputer 有权
标题翻译：多千兆高效并行超级计算机

公开(公告)号：US09081501B2

公开(公告)日：2015-07-14

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/173 , G06F9/06 , G06F15/76

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

5.

发明授权
Optimizing TLB entries for mixed page size storage in contiguous memory 有权
标题翻译：优化连续内存中混合页大小存储的TLB条目

公开(公告)号：US08856490B2

公开(公告)日：2014-10-07

申请号：US13618730

申请日：2012-09-14

申请人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

发明人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

IPC分类号： G06F12/06 , G06F12/10

CPC分类号： G06F12/1027 , G06F2212/652 , G06F2212/654

摘要： A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.

摘要翻译： 提供了一种访问存储器的系统和方法。该系统包括用于存储一个或多个页表条目的查找缓冲器，其中所述一个或多个页表条目中的每一个包括至少虚拟页码和物理页号; 用于从所述处理器接收虚拟地址的逻辑电路，所述逻辑电路用于将所述虚拟地址与所述页表项之一中的虚拟页号进行匹配，以选择所述同一页表项中的所述物理页号，所述页表项具有一个或多个位被设置为从页面排除存储器范围。

6.

发明授权
Embedding global barrier and collective in torus network with each node combining input from receivers according to class map for output to senders 有权
标题翻译：在环网中嵌入全局屏障和集体，每个节点根据类映射将接收器的输入组合到输出到发送器

公开(公告)号：US08521990B2

公开(公告)日：2013-08-27

申请号：US12723277

申请日：2010-03-12

申请人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Philip Heidelberger , Robert M. Senger , Valentina Salapura , Burkhard Steinmacher-Burow , Yutaka Sugawara , Todd E. Takken

发明人： Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Philip Heidelberger , Robert M. Senger , Valentina Salapura , Burkhard Steinmacher-Burow , Yutaka Sugawara , Todd E. Takken

IPC分类号： G06F15/16

CPC分类号： G06F9/30021 , G06F9/3001 , G06F9/30018 , G06F9/30145 , G06F11/3024 , G06F11/3409 , G06F11/348 , G06F15/17362 , G06F15/17381 , G06F15/17393 , G06F2201/88 , H04L67/10

摘要： Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.

摘要翻译： 本发明的实施例提供了一种用于在被组织为环面网络的并行计算机系统中嵌入全局屏障和全局中断网络的方法，系统和计算机程序产品。计算机系统包括多个节点。在一个实施例中，该方法包括从节点的一组接收器中获取输入，将来自接收器的输入划分为多个类，组合每个类的输入以获得结果，并将所述结果发送到一组的节点的发送者。本发明的实施例提供了一种用于将集体网络嵌入组织为环面网络的并行计算机系统中的方法，系统和计算机程序产品。在一个实施例中，该方法包括向环形网络添加集中逻辑以在树结构中的至少一组节点之间路由消息。

7.

发明申请
TLB EXCLUSION RANGE 有权
标题翻译： TLB排除范围

公开(公告)号：US20110173411A1

公开(公告)日：2011-07-14

申请号：US12684642

申请日：2010-01-08

申请人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

发明人： Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Jon K. Kriegel , Martin Ohmacht , Burkhard Steinmacher-Burow

IPC分类号： G06F12/10 , G06F12/00 , G06F12/08

CPC分类号： G06F12/1027 , G06F2212/652 , G06F2212/654

摘要： A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.

摘要翻译： 提供了一种访问存储器的系统和方法。该系统包括用于存储一个或多个页表条目的查找缓冲器，其中所述一个或多个页表条目中的每一个包括至少虚拟页码和物理页号; 用于从所述处理器接收虚拟地址的逻辑电路，所述逻辑电路用于将所述虚拟地址与所述页表项之一中的虚拟页号进行匹配，以选择所述同一页表项中的所述物理页号，所述页表项具有一个或多个位被设置为从页面排除存储器范围。

8.

发明申请
Deterministic error recovery protocol 失效
标题翻译：确定性错误恢复协议

公开(公告)号：US20050081078A1

公开(公告)日：2005-04-14

申请号：US10674952

申请日：2003-09-30

申请人： Matthias Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F11/00 , G06F11/07 , G06F11/14 , H04L29/06 , H04L29/14

CPC分类号： G06F11/1443 , G06F11/0709 , G06F11/0793 , H04L1/0052 , H04L69/28 , H04L69/40 , H04L2001/0092

摘要： Disclosed are an error recovery method and system for use with a communication system having first and second nodes, each of said nodes having a receiver and a sender, the sender of the first node being connected to the receiver of the second node by a first cable, and the sender of the second node being connected to the receiver of the first node by a second cable. The method comprising the step of after one of the nodes detects an error, both of the nodes entering the same defined state. In particular, the receiver of the first node enters an error state, stays in the error state for a defined period of time T, and, after said defined period of time T, enters a wait state. Also, the sender of the first node sends to the receiver of the second node an error message for a defined period of time Te, and after the defined period of time Te, the sender of the first node enters an idle state.

摘要翻译： 公开了一种用于与具有第一和第二节点的通信系统一起使用的错误恢复方法和系统，每个所述节点具有接收器和发送器，第一节点的发送器通过第一电缆连接到第二节点的接收器并且第二节点的发送者通过第二电缆连接到第一节点的接收器。所述方法包括在所述节点中的一个检测到错误之后的两个节点进入相同的定义状态的步骤。特别地，第一节点的接收机进入错误状态，在定义的时间段T内保持在错误状态，并且在所述定义的时间段T之后进入等待状态。此外，第一节点的发送方在给定的时间段Te的情况下向第二节点的接收者发送错误消息，并且在定义的时间段Te之后，第一节点的发送者进入空闲状态。

9.

发明申请
REPRODUCIBILITY IN A MULTIPROCESSOR SYSTEM 有权
标题翻译：多处理器系统中的可重复性

公开(公告)号：US20110119521A1

公开(公告)日：2011-05-19

申请号：US12774475

申请日：2010-05-05

申请人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

发明人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

IPC分类号： G06F1/04

CPC分类号： G06F1/10 , G06F11/2242

摘要： Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

摘要翻译： 如果问题是可重现的，通常会大大帮助解决问题。为了确保多处理器系统的可重复性，提出了以下方面：确定性系统启动状态，单个系统时钟，系统中的时钟相位对齐，全系统同步事件，系统组件的可重复执行，确定性芯片接口，零 - 与系统进行通信，精确地停止系统并扫描系统状态。

10.

发明授权
Local rollback for fault-tolerance in parallel computing systems 有权
标题翻译：并行计算系统容错的局部回滚

公开(公告)号：US08103910B2

公开(公告)日：2012-01-24

申请号：US12696780

申请日：2010-01-29

申请人： Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam

发明人： Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam

IPC分类号： G06F11/00

CPC分类号： G06F15/17381 , G06F9/30072

摘要： A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

摘要翻译： 控制逻辑设备在并行超级计算系统中执行本地回滚。超级计算系统包括至少一个高速缓冲存储器设备。控制逻辑设备确定本地回滚间隔。控制逻辑器件在本地回滚间隔中运行至少一条指令。控制逻辑设备评估在本地回滚间隔期间运行至少一条指令时是否发生不可恢复的条件。控制逻辑器件检查本地回滚期间是否发生错误。如果发生错误，并且在本地回滚间隔期间不发生不可恢复的条件，则控制逻辑设备将重新启动本地回滚间隔。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类