专利检索 ap:("Matthias Blumrich" OR "Dong Chen" OR "Paul Coteus" OR "Alan Gara" OR "Mark Giampapa" OR "Philip Heidelberger" OR "Dirk Hoenicke" OR "Martin Ohmacht" OR "Burkhard Steinmacher-Burow" OR "Todd Takken" OR "Pavlos Vranas") AND inv:"Burkhard Steinmacher-Burow" 第 1 页

1.

发明申请
LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
标题翻译：低延迟存储器访问和同步

公开(公告)号：US20070204112A1

公开(公告)日：2007-08-30

申请号：US11617276

申请日：2006-12-28

申请人： Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

发明人： Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

IPC分类号： G06F12/14

CPC分类号： G06F12/0862 , G06F9/52 , G06F2212/6028

摘要： A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

摘要翻译： 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。多处理器中的每个处理器共享资源，并且每个共享资源在锁定设备内具有关联的锁，其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。当处理器拥有与该资源相关联的锁定时，处理器仅具有访问资源的权限，并且处理器拥有锁的尝试仅需要单个加载操作，而不是传统的原子负载后跟存储，使得处理器只执行读取操作，并且硬件锁定装置执行后续的写入操作而不是处理器。还公开了用于非连续数据结构的简单预取。重新定义存储器线，使得除了正常的物理存储器数据之外，每行包括足够大的指针以指向存储器中的任何其他行，其中指针用于确定要预取的存储器行而不是一些其它预测算法。这使得硬件能够有效地预取不连续但重复的存储器访问模式。

2.

发明申请
Methods and apparatus using commutative error detection values for fault isolation in multiple node computers 失效
标题翻译：使用多节点计算机故障隔离交换误差检测值的方法和装置

公开(公告)号：US20060248370A1

公开(公告)日：2006-11-02

申请号：US11106069

申请日：2005-04-14

申请人： Gheorghe Almasi , Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Sarabjeet Singh , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

发明人： Gheorghe Almasi , Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Sarabjeet Singh , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas

IPC分类号： G06F11/00

CPC分类号： G06F11/1633

摘要： The present invention concerns methods and apparatus for performing fault isolation in multiple node computing systems using commutative error detection values—for example, checksums—to identify and to isolate faulty nodes. In the present invention nodes forming the multiple node computing system are networked together and during program execution communicate with one another by transmitting information through the network. When information associated with a reproducible portion of a computer program is injected into the network by a node, a commutative error detection value is calculated and stored in commutative error detection apparatus associated with the node. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values saved in the commutative error detection apparatus associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created; the node fault detection apparatus retrieves them and stores them in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in commutative error detection values indicate that the node may be faulty.

摘要翻译： 本发明涉及在多节点计算系统中使用交换性错误检测值（例如校验和）识别和隔离故障节点来执行故障隔离的方法和装置。在本发明中，形成多节点计算系统的节点被联网在一起，并且在程序执行期间通过网络传送信息彼此通信。当与计算机程序的可再现部分相关联的信息被节点注入到网络中时，计算交换性错误检测值并将其存储在与节点相关联的交换错误检测装置中。间歇地，与多节点计算机系统相关联的节点故障检测装置检索保存在与节点相关联的交换性错误检测装置中的交换性错误检测值，并将其存储在存储器中。当多节点计算机系统再次执行计算机程序时，创建新的交换错误检测值; 节点故障检测装置检索它们并将其存储在存储器中。节点故障检测装置通过比较与来自应用程序的不同运行的特定节点生成的应用程序的可再现部分相关联的交换错误检测值来识别故障节点。交换性错误检测值的差异表明节点可能有故障。

3.

发明申请
Multidimensional switch network 失效
标题翻译：多维交换机网络

公开(公告)号：US20050195808A1

公开(公告)日：2005-09-08

申请号：US10793068

申请日：2004-03-04

申请人： Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas , Matthias Blumrich

发明人： Dong Chen , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas , Matthias Blumrich

IPC分类号： H04L12/26

CPC分类号： H04L49/1576 , H04L45/06

摘要： Multidimensional switch data networks are disclosed, such as are used by a distributed-memory parallel computer, as applied for example to computations in the field of life sciences. A distributed memory parallel computing system comprises a number of parallel compute nodes and a message passing data network connecting the compute nodes together. The data network connecting the compute nodes comprises a multidimensional switch data network of compute nodes having N dimensions, and a number/array of compute nodes Ln in each of the N dimensions. Each compute node includes an N port routing element having a port for each of the N dimensions. Each compute node of an array of Ln compute nodes in each of the N dimensions connects through a port of its routing element to an Ln port crossbar switch having Ln ports. Several embodiments are disclosed of a 4 dimensional computing system having 65,536 compute nodes.

摘要翻译： 公开了多维交换机数据网络，例如由分布式存储器并行计算机使用的，例如应用于生命科学领域的计算。分布式存储器并行计算系统包括多个并行计算节点和将计算节点连接在一起的消息传递数据网络。连接计算节点的数据网络包括具有N维的计算节点的多维交换机数据网络和N个维度中的每一个中的计算节点Ln的数量/数组。每个计算节点包括具有用于N个维度中的每一个的端口的N端口路由元件。每个N维中的Ln计算节点阵列的每个计算节点通过其路由元素的端口连接到具有Ln端口的Ln端口交叉开关。公开了具有65,536个计算节点的四维计算系统的几个实施例。

4.

发明申请
Deterministic error recovery protocol 失效
标题翻译：确定性错误恢复协议

公开(公告)号：US20050081078A1

公开(公告)日：2005-04-14

申请号：US10674952

申请日：2003-09-30

申请人： Matthias Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Dirk Hoenicke , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F11/00 , G06F11/07 , G06F11/14 , H04L29/06 , H04L29/14

CPC分类号： G06F11/1443 , G06F11/0709 , G06F11/0793 , H04L1/0052 , H04L69/28 , H04L69/40 , H04L2001/0092

摘要： Disclosed are an error recovery method and system for use with a communication system having first and second nodes, each of said nodes having a receiver and a sender, the sender of the first node being connected to the receiver of the second node by a first cable, and the sender of the second node being connected to the receiver of the first node by a second cable. The method comprising the step of after one of the nodes detects an error, both of the nodes entering the same defined state. In particular, the receiver of the first node enters an error state, stays in the error state for a defined period of time T, and, after said defined period of time T, enters a wait state. Also, the sender of the first node sends to the receiver of the second node an error message for a defined period of time Te, and after the defined period of time Te, the sender of the first node enters an idle state.

摘要翻译： 公开了一种用于与具有第一和第二节点的通信系统一起使用的错误恢复方法和系统，每个所述节点具有接收器和发送器，第一节点的发送器通过第一电缆连接到第二节点的接收器并且第二节点的发送者通过第二电缆连接到第一节点的接收器。所述方法包括在所述节点中的一个检测到错误之后的两个节点进入相同的定义状态的步骤。特别地，第一节点的接收机进入错误状态，在定义的时间段T内保持在错误状态，并且在所述定义的时间段T之后进入等待状态。此外，第一节点的发送方在给定的时间段Te的情况下向第二节点的接收者发送错误消息，并且在定义的时间段Te之后，第一节点的发送者进入空闲状态。

5.

发明申请
MULTIPLE NODE REMOTE MESSAGING 有权
标题翻译：多个节点远程消息传递

公开(公告)号：US20090006546A1

公开(公告)日：2009-01-01

申请号：US11768784

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F15/16

CPC分类号： G06F15/16

摘要： A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

摘要翻译： 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括：第一计算节点（A）将单个远程消息发送到远程第二计算节点（B），以便控制远程第二计算节点（B）发送至少一个远程消息。该方法包括各种步骤，包括在第一计算节点（A）处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符，用于控制远程第二计算节点（B）至少发送一个远程消息，包括将第一消息描述符放在第一计算节点（A）的注入FIFO中，并将单个远程消息和至少一个远程消息描述符发送到第二计算节点（B）。

6.

发明申请
Method and apparatus for re-utilizing partially failed resources as network resources 失效
标题翻译：将部分故障资源重新利用作为网络资源的方法和装置

公开(公告)号：US20070168695A1

公开(公告)日：2007-07-19

申请号：US11335784

申请日：2006-01-19

申请人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Liebsch , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Dong Chen , Alan Gara , Philip Heidelberger , Thomas Liebsch , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F11/00

CPC分类号： G06F11/0793 , G06F11/0724

摘要： A method and apparatus for re-utilizing partially failed compute resources in a massively parallel super computer system. In the preferred embodiments the compute node comprises a number of clock domains that can be enabled separately. When an error in a compute node is detected, and the failure is not in network communication blocks, a clock enable circuit enables the clocks to the network communication blocks only to allow the partially failed compute node to be re-utilized as a network resource. The computer system can then continue to operate with only slightly diminished performance and thereby improve performance and perceived overall reliability.

摘要翻译： 在大规模并行的超级计算机系统中重新利用部分失败的计算资源的方法和装置。在优选实施例中，计算节点包括可以单独使能的多个时钟域。当检测到计算节点中的错误，并且故障不在网络通信块中时，时钟使能电路仅允许网络通信块的时钟允许部分失败的计算节点被重新利用为网络资源。然后，计算机系统可以继续操作，性能略有降低，从而提高性能和可察觉的整体可靠性。

7.

发明授权
Multiple node remote messaging 有权
标题翻译：多节点远程消息传递

公开(公告)号：US07788334B2

公开(公告)日：2010-08-31

申请号：US11768784

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F15/167 , G06F13/28

CPC分类号： G06F15/16

摘要： A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

摘要翻译： 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括：第一计算节点（A）将单个远程消息发送到远程第二计算节点（B），以便控制远程第二计算节点（B）发送至少一个远程消息。该方法包括各种步骤，包括在第一计算节点（A）处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符，用于控制远程第二计算节点（B）至少发送一个远程消息，包括将第一消息描述符放在第一计算节点（A）的注入FIFO中，并将单个远程消息和至少一个远程消息描述符发送到第二计算节点（B）。

8.

发明申请
One-bounce network 失效
标题翻译：单反网络

公开(公告)号：US20050002387A1

公开(公告)日：2005-01-06

申请号：US10675129

申请日：2003-09-30

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： H04L12/56 , H04Q11/00

CPC分类号： H04L45/06

摘要： A one-bounce data network comprises a plurality of nodes interconnected to each other via communication links, the network including a plurality of interconnected switch devices, said switch devices interconnected such that a message is communicated between any two switches passes over a single link from a source switch to a destination switch; and, the source switch concurrently sends a message to an arbitrary bounce switch which then sends the message to the destination switch.

摘要翻译： 一弹跳数据网络包括经由通信链路相互互连的多个节点，所述网络包括多个互连的交换设备，所述交换设备互连，使得在任何两个交换机之间传送的消息通过单个链路从源切换到目的地交换机; 并且，源交换机同时向任意的反弹交换机发送消息，然后将消息发送到目的地交换机。

9.

发明申请
REPRODUCIBILITY IN A MULTIPROCESSOR SYSTEM 有权
标题翻译：多处理器系统中的可重复性

公开(公告)号：US20110119521A1

公开(公告)日：2011-05-19

申请号：US12774475

申请日：2010-05-05

申请人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

发明人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

IPC分类号： G06F1/04

CPC分类号： G06F1/10 , G06F11/2242

摘要： Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

摘要翻译： 如果问题是可重现的，通常会大大帮助解决问题。为了确保多处理器系统的可重复性，提出了以下方面：确定性系统启动状态，单个系统时钟，系统中的时钟相位对齐，全系统同步事件，系统组件的可重复执行，确定性芯片接口，零 - 与系统进行通信，精确地停止系统并扫描系统状态。

10.

发明授权
Reproducibility in a multiprocessor system 有权
标题翻译：多处理器系统中的重现性

公开(公告)号：US08595554B2

公开(公告)日：2013-11-26

申请号：US12774475

申请日：2010-05-05

申请人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

发明人： Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara

IPC分类号： G06F11/00

CPC分类号： G06F1/10 , G06F11/2242

摘要： Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

摘要翻译： 如果问题是可重现的，通常会大大帮助解决问题。为了确保多处理器系统的可重复性，提出了以下方面：确定性系统启动状态，单个系统时钟，系统中的时钟相位对齐，全系统同步事件，系统组件的可重复执行，确定性芯片接口，零 - 与系统进行通信，精确地停止系统并扫描系统状态。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类