专利检索 ap:("Dong Chen" OR "Noel A. Eisley" OR "Philip Heidelberger" OR "Burkhard Steinmacher-Burow") AND inv:"Dong Chen" 第 8 页

71.

发明授权
Snoop filter for filtering snoop requests 失效
标题翻译：用于过滤窥探请求的Snoop过滤器

公开(公告)号：US08255638B2

公开(公告)日：2012-08-28

申请号：US12113262

申请日：2008-05-01

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Martin Ohmacht , Valentina Salapura , Pavlos M. Vranas

IPC分类号： G06F12/00 , G06F13/00

CPC分类号： G06F12/0822 , G06F12/0831 , G06F2212/507 , Y02D10/13

摘要： A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.

摘要翻译： 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联并与之可操作地相连的一个或多个本地高速缓冲存储器。该方法包括提供与每个处理单元相关联的窥探过滤器设备，每个窥探过滤器设备具有多个专用输入端口，用于从多处理器计算环境中的专用存储器写入源接收窥探请求。每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器，每个端口窥探滤波器实现一个或多个并行操作子滤波器元件，其适于同时滤除从相应专用存储器接收的窥探请求写入源并将这些请求的子集转发到其相关联的处理单元。

72.

发明授权
Managing coherence via put/get windows 失效
标题翻译：通过put / get窗口管理一致性

公开(公告)号：US08122197B2

公开(公告)日：2012-02-21

申请号：US12543890

申请日：2009-08-19

申请人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht

发明人： Matthias A. Blumrich , Dong Chen , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht

IPC分类号： G06F12/00 , G06F13/00 , G06F13/28

CPC分类号： H05K7/20836 , F24F11/77 , G06F9/52 , G06F9/526 , G06F15/17381 , G06F17/142 , G09G5/008 , H04L7/0338

摘要： A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

摘要翻译： 一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。通常，本发明涉及一种软件算法，其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。软件算法使用put / get窗口的打开和关闭来协调激活的所需要的，以实现缓存一致性。硬件设备可以是硬件地址解码的扩展，其在节点的物理存储器地址空间中创建（a）实际不存在的虚拟存储器的区域，并且（b）因此能够立即响应从处理元素读取和写入请求。

73.

发明授权
Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment 有权
标题翻译：增加可用的FIFO空间，以防止DMA环境中的消息队列死锁

公开(公告)号：US08112559B2

公开(公告)日：2012-02-07

申请号：US12241634

申请日：2008-09-30

申请人： Michael A. Blocksome , Dong Chen , Thomas Gooding , Philip Heidelberger , Jeff Parker

发明人： Michael A. Blocksome , Dong Chen , Thomas Gooding , Philip Heidelberger , Jeff Parker

IPC分类号： G06F13/28 , G06F15/167

CPC分类号： G06F13/28

摘要： Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.

摘要翻译： 本发明的实施例可以用于在并行计算环境中管理消息队列以防止消息队列死锁。计算节点的直接存储器访问控制器可以确定消息队列何时已满。作为响应，DMA可能会产生中断。中断处理程序可能会停止DMA，并将所有描述符从完整消息队列交换到更大的队列（或放大原始队列）。然后中断处理程序重新启动DMA。或者，中断处理程序停止DMA，分配存储块来保存队列数据，然后将描述符从完整消息队列移动到分配的内存块中。然后中断处理程序重新启动DMA。在正常消息传递提前周期期间，消息收发管理器尝试将描述符注入到其他消息队列中，直到描述符全部被处理。

74.

发明授权
Collective network for computer structures 有权
标题翻译：计算机结构集体网络

公开(公告)号：US08001280B2

公开(公告)日：2011-08-16

申请号：US11572372

申请日：2005-07-18

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： G06F15/16

CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

75.

发明申请
ZONE ROUTING IN A TORUS NETWORK 失效
标题翻译：多功能网络中的区域路由

公开(公告)号：US20110173343A1

公开(公告)日：2011-07-14

申请号：US12684184

申请日：2010-01-08

申请人： Dong Chen , Philip Heidelberger , Sameer Kumar

发明人： Dong Chen , Philip Heidelberger , Sameer Kumar

IPC分类号： G06F15/173

CPC分类号： G06F15/17381

摘要： A system for routing data in a network comprising a network logic device at a sending node for determining a path between the sending node and a receiving node, wherein the network logic device sets one or more selection bits and one or more hint bits within the data packet, a control register for storing one or more masks, wherein the network logic device uses the one or more selection bits to select a mask from the control register and the network logic device applies the selected mask to the hint bits to restrict routing of the data packet to one or more routing directions for the data packet within the network and selects one of the restricted routing directions from the one or more routing directions and sends the data packet along a link in the selected routing direction toward the receiving node.

摘要翻译： 一种用于在网络中路由数据的系统，包括在发送节点处的网络逻辑设备，用于确定发送节点和接收节点之间的路径，其中网络逻辑设备设置数据内的一个或多个选择位和一个或多个提示位分组，用于存储一个或多个掩码的控制寄存器，其中所述网络逻辑设备使用所述一个或多个选择位从所述控制寄存器中选择掩码，并且所述网络逻辑设备将所选择的掩码应用于所述提示位以限制数据分组发送到网络内的数据分组的一个或多个路由方向，并且从一个或多个路由选择中选择一个受限制的路由方向，并沿所选路由方向的链路向接收节点发送数据分组。

76.

发明申请
NETWORK SUPPORT FOR SYSTEM INITIATED CHECKPOINTS 失效
标题翻译：网络支持系统启动检查

公开(公告)号：US20110173289A1

公开(公告)日：2011-07-14

申请号：US12731796

申请日：2010-03-25

申请人： Dong Chen , Philip Heidelberger

发明人： Dong Chen , Philip Heidelberger

IPC分类号： G06F15/173 , G06F15/167

CPC分类号： G06F15/167 , G06F11/141

摘要： A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.

摘要翻译： 一种用于在并行计算系统中支持系统启动的检查点的系统，方法和计算机程序产品。系统和方法产生选择性控制信号，以在存在与在节点处运行的用户应用程序相关联的消息传递活动的情况下执行系统相关数据的检查点。检查点由系统启动，使得即使在存在包括正在进行的用户消息活动的高度并行计算机上的用户应用的情况下，也可以获得多个网络节点的检查点数据。

77.

发明申请
ULTRASCALABLE PETAFLOP PARALLEL SUPERCOMPUTER 失效
标题翻译：超声波PETAFLOP并行超级计算机

公开(公告)号：US20090006808A1

公开(公告)日：2009-01-01

申请号：US11768905

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , George Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Shawn Hall , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Martin Ohmacht , Valentina Salapura , Krishnan Sugavanam , Todd Takken

发明人： Matthias A. Blumrich , Dong Chen , George Chiu , Thomas M. Cipolla , Paul W. Coteus , Alan G. Gara , Mark E. Giampapa , Shawn Hall , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Martin Ohmacht , Valentina Salapura , Krishnan Sugavanam , Todd Takken

IPC分类号： G06F15/80 , G06F9/06

CPC分类号： G06F15/17337

摘要： A novel massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. Novel use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

摘要翻译： petaOPS规模的一种新型大规模并行超级计算机包括基于片上系统技术的节点架构，其中每个处理节点包括具有多达四个处理元件的单个专用集成电路（ASIC）。 ASIC节点通过多个独立网络互连，以最小的延迟最大化节点之间的数据包通信的吞吐量。多个网络可以包括用于并行算法消息传递的三个高速网络，包括Torus，集合网络和提供全局障碍和通知功能的全球异步网络。这些多个独立网络可以根据用于优化算法处理性能的算法的需求或阶段来协同或独立地利用。提供了新的使用DMA引擎来促进节点之间的消息传递，而不需要节点处理资源。

78.

发明申请
METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP 失效
标题翻译：有效跟踪与TIMESTAMP相关的队列的方法和设备

公开(公告)号：US20090006672A1

公开(公告)日：2009-01-01

申请号：US11768800

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

IPC分类号： G06F3/00 , G06F1/04

CPC分类号： G06F12/0835 , G06F12/0831

摘要： An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

摘要翻译： 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。该装置包括相干逻辑单元，每个单元具有多个队列结构，每个队列结构与在系统中传输的事件信号的相应发送者相关联。与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队，并且计数器跟踪队列结构中剩余入队的多个相干事件信号，并且从接收到时间戳信号起出队。计数器机构产生一个输出信号，指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。在一个实施例中，时间戳信号在存储器同步操作的开始被断言，并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。然后可以将该信号用作存储器同步操作的完成条件的一部分。

79.

发明申请
DMA SHARED BYTE COUNTERS IN A PARALLEL COMPUTER 失效
标题翻译： DMA并发计算机中的共享字节计数器

公开(公告)号：US20090006666A1

公开(公告)日：2009-01-01

申请号：US11768781

申请日：2007-06-26

申请人： Dong Chen , Alan G. Gara , Philip Heidelberger , Pavlos Vranas

发明人： Dong Chen , Alan G. Gara , Philip Heidelberger , Pavlos Vranas

IPC分类号： G06F13/28

CPC分类号： G06F13/28 , Y02D10/14

摘要： A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.

摘要翻译： 并行计算机系统被构造为互连计算节点的网络。每个计算节点包括至少一个处理器，存储器和DMA引擎。 DMA引擎包括用于与至少一个处理器连接的处理器接口，DMA逻辑，用于与存储器连接的存储器接口，用于与网络接口的DMA网络接口，注入和接收字节计数器，注入和接收FIFO元数据，和状态寄存器和控制寄存器。注入FIFO保持注入FIFO元数据存储器位置的存储器位置，包括其当前头部和尾部，并且接收FIFO保持包括其当前头部和尾部的接收FIFO元数据存储器位置。注入字节计数器和接收字节计数器可以在消息之间共享。

80.

发明授权
Methods and apparatus using commutative error detection values for fault isolation in multiple node computers 失效
标题翻译：使用多节点计算机故障隔离交换误差检测值的方法和装置

公开(公告)号：US07383490B2

公开(公告)日：2008-06-03

申请号：US11106069

申请日：2005-04-14

申请人： Gheorghe Almasi , Matthias Augustin Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Sarabjeet Singh , Burkhard D. Steinmacher-Burow , Todd Takken , Pavlos Vranas

发明人： Gheorghe Almasi , Matthias Augustin Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk I. Hoenicke , Sarabjeet Singh , Burkhard D. Steinmacher-Burow , Todd Takken , Pavlos Vranas

IPC分类号： G06F11/00 , H03M13/00

CPC分类号： G06F11/1633

摘要： Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for—example, checksums—to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.

摘要翻译： 方法和设备使用交换性错误检测值（例如，校验和）识别和隔离故障节点，在多个节点计算系统中执行故障隔离。当与计算机程序的可再现部分相关联的信息被节点注入到网络中时，计算交换性错误检测值。每隔一段时间，与多节点计算机系统相关联的节点故障检测装置检索与节点相关联的交换错误检测值并将其存储在存储器中。当多节点计算机系统再次执行计算机程序时，创建新的交换错误检测值并将其存储在存储器中。节点故障检测装置通过比较与来自应用程序的不同运行的特定节点生成的应用程序的可再现部分相关联的交换错误检测值来识别故障节点。值的差异表示可能的故障节点。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类