-
公开(公告)号:US09507647B2
公开(公告)日:2016-11-29
申请号:US13008531
申请日:2011-01-18
申请人: Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Phlip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Xiaotong Zhuang
发明人: Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Phlip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Xiaotong Zhuang
摘要: In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
摘要翻译: 在多处理器系统中,在L2高速缓冲存储器中实现冲突检查机制。 不同版本的推测性写入以不同的方式保存在缓存中。 高速缓存目录中保留了推测性写入记录。 冲突检查作为目录查找的一部分发生。 不冲突的推测版本以不同的缓存方式聚合成聚合版本。 推测内存访问请求不会转到主内存。
-
公开(公告)号:US20110219188A1
公开(公告)日:2011-09-08
申请号:US13008531
申请日:2011-01-18
申请人: Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow , Zhuang Xiaotong
发明人: Matthias A. Blumrich , Luis H. Ceze , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow , Zhuang Xiaotong
IPC分类号: G06F12/08
摘要: In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
摘要翻译: 在多处理器系统中,在L2高速缓冲存储器中实现冲突检查机制。 不同版本的推测性写入以不同的方式保存在缓存中。 高速缓存目录中保留了推测性写入记录。 冲突检查作为目录查找的一部分发生。 不冲突的推测版本以不同的缓存方式聚合成聚合版本。 推测内存访问请求不会转到主内存。
-
公开(公告)号:US09081501B2
公开(公告)日:2015-07-14
申请号:US13004007
申请日:2011-01-10
申请人: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
发明人: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
IPC分类号: G06F15/173 , G06F9/06 , G06F15/76
CPC分类号: G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14
摘要: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译: 具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
-
公开(公告)号:US08103910B2
公开(公告)日:2012-01-24
申请号:US12696780
申请日:2010-01-29
申请人: Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam
发明人: Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam
IPC分类号: G06F11/00
CPC分类号: G06F15/17381 , G06F9/30072
摘要: A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
摘要翻译: 控制逻辑设备在并行超级计算系统中执行本地回滚。 超级计算系统包括至少一个高速缓冲存储器设备。 控制逻辑设备确定本地回滚间隔。 控制逻辑器件在本地回滚间隔中运行至少一条指令。 控制逻辑设备评估在本地回滚间隔期间运行至少一条指令时是否发生不可恢复的条件。 控制逻辑器件检查本地回滚期间是否发生错误。 如果发生错误,并且在本地回滚间隔期间不发生不可恢复的条件,则控制逻辑设备将重新启动本地回滚间隔。
-
公开(公告)号:US20110119526A1
公开(公告)日:2011-05-19
申请号:US12696780
申请日:2010-01-29
申请人: Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam
发明人: Matthias A. Blumrich , Dong Chen , Alan Gara , Mark E. Giampapa , Philip Heidelberger , Martin Ohmacht , Burkhard Steinmacher-Burow , Krishnan Sugavanam
CPC分类号: G06F15/17381 , G06F9/30072
摘要: A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
摘要翻译: 控制逻辑设备在并行超级计算系统中执行本地回滚。 超级计算系统包括至少一个高速缓冲存储器设备。 控制逻辑设备确定本地回滚间隔。 控制逻辑器件在本地回滚间隔中运行至少一条指令。 控制逻辑设备评估在本地回滚间隔期间运行至少一条指令时是否发生不可恢复的条件。 控制逻辑器件检查本地回滚期间是否发生错误。 如果发生错误,并且在本地回滚间隔期间不发生不可恢复的条件,则控制逻辑设备将重新启动本地回滚间隔。
-
公开(公告)号:US20110219208A1
公开(公告)日:2011-09-08
申请号:US13004007
申请日:2011-01-10
申请人: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
发明人: Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu
CPC分类号: G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14
摘要: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译: 具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
-
公开(公告)号:US20110219215A1
公开(公告)日:2011-09-08
申请号:US13008546
申请日:2011-01-18
申请人: Matthias A. Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow
发明人: Matthias A. Blumrich , Dong Chen , Alan Gara , Philip Heidelberger , Martin Ohmarcht , Burkhard Steinmacher-Burow
IPC分类号: G06F9/30
摘要: In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.
摘要翻译: 在具有推测性执行的多处理器系统中,可以以几种方式逼近原子性。 一种方法是具有实现多种功能并保证完成的原子指令。 另一种方法是将代码块分组成一起成功或失败。 系统可以包含多种这样的方法。 在实施多种方法时,系统可以优先考虑其他方法。 当通过高速缓冲存储器中的目录查找完成冲突检测时,原子指令和原子性相关操作可以在该高速缓冲存储器中的高速缓存数据阵列访问流水线中实现。 该实现可以包括用于在原子指令内实现多个功能并且还用于级联原子指令的流水线的反馈。
-
公开(公告)号:US08595554B2
公开(公告)日:2013-11-26
申请号:US12774475
申请日:2010-05-05
申请人: Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara
发明人: Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara
IPC分类号: G06F11/00
CPC分类号: G06F1/10 , G06F11/2242
摘要: Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.
摘要翻译: 如果问题是可重现的,通常会大大帮助解决问题。 为了确保多处理器系统的可重复性,提出了以下方面:确定性系统启动状态,单个系统时钟,系统中的时钟相位对齐,全系统同步事件,系统组件的可重复执行,确定性芯片接口,零 - 与系统进行通信,精确地停止系统并扫描系统状态。
-
公开(公告)号:US20070204112A1
公开(公告)日:2007-08-30
申请号:US11617276
申请日:2006-12-28
申请人: Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas
发明人: Matthias Blumrich , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Philip Heidelberger , Dirk Hoenicke , Martin Ohmacht , Burkhard Steinmacher-Burow , Todd Takken , Pavlos Vranas
IPC分类号: G06F12/14
CPC分类号: G06F12/0862 , G06F9/52 , G06F2212/6028
摘要: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
摘要翻译: 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。
-
公开(公告)号:US20110119521A1
公开(公告)日:2011-05-19
申请号:US12774475
申请日:2010-05-05
申请人: Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara
发明人: Ralph A. Bellofatto , Dong Chen , Paul W. Coteus , Noel A. Eisley , Alan Gara , Thomas M. Gooding , Rudolf A. Haring , Philip Heidelberger , Gerard V. Kopcsay , Thomas A. Liebsch , Martin Ohmacht , Don D. Reed , Robert M. Senger , Burkhard Steinmacher-Burow , Yutaka Sugawara
IPC分类号: G06F1/04
CPC分类号: G06F1/10 , G06F11/2242
摘要: Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.
摘要翻译: 如果问题是可重现的,通常会大大帮助解决问题。 为了确保多处理器系统的可重复性,提出了以下方面:确定性系统启动状态,单个系统时钟,系统中的时钟相位对齐,全系统同步事件,系统组件的可重复执行,确定性芯片接口,零 - 与系统进行通信,精确地停止系统并扫描系统状态。
-
-
-
-
-
-
-
-
-