专利检索 ap:("Dong Chen" OR "Mark E. Giampapa" OR "Philip Heidelberger" OR "Sameer Kumar" OR "Jeffrey J. Parker" OR "Burkhard D. Steinmacher-Burow" OR "Pavlos Vranas") AND inv:"Dong Chen" 第 3 页

21.

发明申请
Collective Network For Computer Structures 有权
标题翻译：计算机结构集体网

公开(公告)号：US20080104367A1

公开(公告)日：2008-05-01

申请号：US11572372

申请日：2005-07-18

申请人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

发明人： Matthias A. Blumrich , Paul W. Coteus , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Dirk Hoenicke , Todd E. Takken , Burkhard D. Steinmacher-Burow , Pavlos M. Vranas

IPC分类号： G06F15/80 , G06F9/30

CPC分类号： G06F15/17381 , H04L1/1845 , H04L12/4641

摘要： A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.

摘要翻译： 一种用于实现互连处理节点之间的高速，低延迟全局集体通信的系统和方法。全局集体网络最优地使得能够在具有多个互连处理节点的计算机结构中执行并行算法操作期间执行集体缩减操作。路由器设备包括通过链路互连网络的节点，以便于在虚拟网络和类结构的节点处执行低延迟全局处理操作。全局集体网络可以被配置为以异步或同步方式提供全局屏障和中断功能。当在大规模并行超级计算结构中实现时，全局集体网络根据处理算法的需要在物理上和逻辑上可分割。

22.

发明授权
Multi-petascale highly efficient parallel supercomputer 有权
标题翻译：多千兆高效并行超级计算机

公开(公告)号：US09081501B2

公开(公告)日：2015-07-14

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/173 , G06F9/06 , G06F15/76

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

23.

发明授权
Message passing with a limited number of DMA byte counters 失效
标题翻译：消息传递有限数量的DMA字节计数器

公开(公告)号：US08032892B2

公开(公告)日：2011-10-04

申请号：US11768813

申请日：2007-06-26

申请人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker

发明人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker

IPC分类号： G06F9/44 , G06F9/46 , G06F13/00 , G06F15/167

CPC分类号： G06F15/17356 , G06F9/546

摘要： A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.

摘要翻译： 一种在并行计算机系统中传送消息的方法，该并行计算机系统被构造为作为网络互连的多个计算节点，其中每个计算节点包括DMA引擎，但是仅包括有限数量的字节计数器，用于跟踪由 DMA引擎，其中可以在共享计数器或专用计数器操作模式中使用字节计数器。该方法包括使用会合协议，源计算节点使用专用注入计数器确定性地发送具有单个RTS描述符的请求（RTS）消息以跟踪要与RTS消息相关联地发送的RTS消息和消息数据，到目的地计算节点，使得RTS描述符向目标计算节点指示消息数据将自适应地路由到目的地节点。在源计算节点使用一个DMA FIFO，将为发往目的地计算节点的会合消息保留RTS描述符，以确保正确的消息数据顺序。在DMA引擎上使用接收计数器，目的地计算节点跟踪RTS和相关联的消息数据的接收，并以远程获取的会合协议形式向源节点发送明确发送（CTS）消息以接受RTS消息和消息数据，并由源计算节点DMA引擎处理远程获取（CTS）以提供要发送的消息数据。

24.

发明申请
MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER 有权
标题翻译：多层高效平行超级计算机

公开(公告)号：US20110219208A1

公开(公告)日：2011-09-08

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/76 , G06F9/06

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

25.

发明申请
MESSAGE PASSING WITH A LIMITED NUMBER OF DMA BYTE COUNTERS 失效
标题翻译：消息传递与有限数量的DMA字节计数器

公开(公告)号：US20090007141A1

公开(公告)日：2009-01-01

申请号：US11768813

申请日：2007-06-26

申请人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker

发明人： Michael Blocksome , Dong Chen , Mark E. Giampapa , Philip Heidelberger , Sameer Kumar , Jeffrey J. Parker

IPC分类号： G06F9/44

CPC分类号： G06F15/17356 , G06F9/546

摘要： A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.

摘要翻译： 一种在并行计算机系统中传送消息的方法，该并行计算机系统被构造为作为网络互连的多个计算节点，其中每个计算节点包括DMA引擎，但是仅包括有限数量的字节计数器，用于跟踪由 DMA引擎，其中可以在共享计数器或专用计数器操作模式中使用字节计数器。该方法包括使用会合协议，源计算节点使用专用注入计数器确定性地发送具有单个RTS描述符的请求（RTS）消息以跟踪要与RTS消息相关联地发送的RTS消息和消息数据，到目的地计算节点，使得RTS描述符向目标计算节点指示消息数据将自适应地路由到目的地节点。在源计算节点使用一个DMA FIFO，将为发往目的地计算节点的会合消息保留RTS描述符，以确保正确的消息数据顺序。在DMA引擎上使用接收计数器，目的地计算节点跟踪RTS和相关联的消息数据的接收，并以远程获取的会合协议形式向源节点发送明确发送（CTS）消息以接受RTS消息和消息数据，并由源计算节点DMA引擎处理远程获取（CTS）以提供要发送的消息数据。

26.

发明申请
METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP 失效
标题翻译：有效跟踪与TIMESTAMP相关的队列的方法和设备

公开(公告)号：US20090006672A1

公开(公告)日：2009-01-01

申请号：US11768800

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

IPC分类号： G06F3/00 , G06F1/04

CPC分类号： G06F12/0835 , G06F12/0831

摘要： An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

摘要翻译： 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。该装置包括相干逻辑单元，每个单元具有多个队列结构，每个队列结构与在系统中传输的事件信号的相应发送者相关联。与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队，并且计数器跟踪队列结构中剩余入队的多个相干事件信号，并且从接收到时间戳信号起出队。计数器机构产生一个输出信号，指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。在一个实施例中，时间戳信号在存储器同步操作的开始被断言，并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。然后可以将该信号用作存储器同步操作的完成条件的一部分。

27.

发明申请
DMA ENGINE FOR REPEATING COMMUNICATION PATTERNS 失效
标题翻译： DMA引擎重复通信模式

公开(公告)号：US20090006296A1

公开(公告)日：2009-01-01

申请号：US11768795

申请日：2007-06-26

申请人： Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F15/18

CPC分类号： G06F15/163

摘要： A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.

摘要翻译： 并行计算机系统被构造为互连的计算节点的网络，以操作用于在整个网络上执行通信的全局消息传递应用。每个计算节点包括具有存储器的一个或多个单独处理器，该存储器运行在每个计算节点处操作的全局消息传递应用的本地实例，以独立于在其他计算节点执行的处理操作来执行本地处理操作。每个计算节点还包括构造成通过描述多个注入FIFO的注入FIFO元数据与应用交互的DMA引擎，其中每个注入FIFO可以包含任意数量的消息描述符，以便处理具有固定处理开销的消息，而不管消息的数量描述符包含在注入FIFO中。

28.

发明授权
Method and apparatus for efficiently tracking queue entries relative to a timestamp 失效
标题翻译：相对于时间戳有效跟踪队列条目的方法和装置

公开(公告)号：US08756350B2

公开(公告)日：2014-06-17

申请号：US11768800

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Pavlos Vranas

IPC分类号： G06F3/00 , G06F5/00

CPC分类号： G06F12/0835 , G06F12/0831

摘要： An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

摘要翻译： 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。该装置包括相干逻辑单元，每个单元具有多个队列结构，每个队列结构与在系统中传输的事件信号的相应发送者相关联。与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队，并且计数器跟踪队列结构中剩余入队的多个相干事件信号，并且从接收到时间戳信号起出队。计数器机构产生一个输出信号，指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。在一个实施例中，时间戳信号在存储器同步操作的开始被断言，并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。然后可以将该信号用作存储器同步操作的完成条件的一部分。

29.

发明申请
MULTIPLE NODE REMOTE MESSAGING 有权
标题翻译：多个节点远程消息传递

公开(公告)号：US20090006546A1

公开(公告)日：2009-01-01

申请号：US11768784

申请日：2007-06-26

申请人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Matthias A. Blumrich , Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Valentina Salapura , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F15/16

CPC分类号： G06F15/16

摘要： A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

摘要翻译： 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括：第一计算节点（A）将单个远程消息发送到远程第二计算节点（B），以便控制远程第二计算节点（B）发送至少一个远程消息。该方法包括各种步骤，包括在第一计算节点（A）处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符，用于控制远程第二计算节点（B）至少发送一个远程消息，包括将第一消息描述符放在第一计算节点（A）的注入FIFO中，并将单个远程消息和至少一个远程消息描述符发送到第二计算节点（B）。

30.

发明授权
DMA engine for repeating communication patterns 失效
标题翻译：用于重复通信模式的DMA引擎

公开(公告)号：US07802025B2

公开(公告)日：2010-09-21

申请号：US11768795

申请日：2007-06-26

申请人： Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

发明人： Dong Chen , Alan G. Gara , Mark E. Giampapa , Philip Heidelberger , Burkhard Steinmacher-Burow , Pavlos Vranas

IPC分类号： G06F13/28

CPC分类号： G06F15/163

摘要： A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.

摘要翻译： 并行计算机系统被构造为互连的计算节点的网络，以操作用于在整个网络上执行通信的全局消息传递应用。每个计算节点包括具有存储器的一个或多个单独处理器，该存储器运行在每个计算节点处操作的全局消息传递应用的本地实例，以独立于在其他计算节点执行的处理操作来执行本地处理操作。每个计算节点还包括构造成通过描述多个注入FIFO的注入FIFO元数据与应用交互的DMA引擎，其中每个注入FIFO可以包含任意数量的消息描述符，以便处理具有固定处理开销的消息，而不管消息的数量描述符包含在注入FIFO中。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类