专利检索 ap:("Amith R. Mamidala" OR "Valentina Salapura" OR "Robert W. Wisniewski") AND inv:"Valentina Salapura" 第 1 页

1.

发明授权
Mechanisms for efficient intra-die/intra-chip collective messaging 有权
标题翻译：有效的片内/片内集体消息传递的机制

公开(公告)号：US08904118B2

公开(公告)日：2014-12-02

申请号：US12986528

申请日：2011-01-07

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/10 , G06F12/08 , G06F15/167

CPC分类号： G06F12/0831 , G06F15/167

摘要： Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

摘要翻译： 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。集成电路管芯上的多个核被分组成多个共享存储器相干域。多个共享存储器相干域中的每一个连接到集体单元，用于在多个共享存储器相干域之间执行集合操作。

2.

发明授权
Multi-petascale highly efficient parallel supercomputer 有权
标题翻译：多千兆高效并行超级计算机

公开(公告)号：US09081501B2

公开(公告)日：2015-07-14

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/173 , G06F9/06 , G06F15/76

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

3.

发明申请
MECHANISM FOR OPTIMIZED INTRA-DIE INTER-NODELET MESSAGING COMMUNICATION 有权
标题翻译：优化内部信号通信通信机制

公开(公告)号：US20130326180A1

公开(公告)日：2013-12-05

申请号：US13485074

申请日：2012-05-31

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/14

CPC分类号： G06F9/544 , G06F15/167

摘要： Point-to-point intra-nodelet messaging support for nodelets on a single chip that obey MPI semantics may be provided. In one aspect, a local buffering mechanism is employed that obeys standard communication protocols for the network communications between the nodelets integrated in a single chip. Sending messages from one nodelet to another nodelet on the same chip may be performed not via the network, but by exchanging messages in the point-to-point messaging buckets between the nodelets. The messaging buckets need not be part of the memory system of the nodelets. Specialized hardware controllers may be used for moving data between the nodelets and each messaging bucket, and ensuring correct operation of the network protocol.

摘要翻译： 可以提供在遵循MPI语义的单个芯片上的节点的点对点节点内消息支持。在一个方面，采用本地缓冲机制，其遵循集成在单个芯片中的节点之间的网络通信的标准通信协议。从同一芯片上的一个节点发送消息到另一个节点可能不是通过网络执行的，而是通过在节点之间的点对点消息存储区中交换消息。消息传递桶不需要是节点的内存系统的一部分。专用硬件控制器可用于在节点和每个消息传送桶之间移动数据，并确保网络协议的正确操作。

4.

发明申请
MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER 有权
标题翻译：多层高效平行超级计算机

公开(公告)号：US20110219208A1

公开(公告)日：2011-09-08

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/76 , G06F9/06

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

5.

发明授权
Mechanism for optimized intra-die inter-nodelet messaging communication 有权
标题翻译：机构优化模块间节点间消息传递通信

公开(公告)号：US08943516B2

公开(公告)日：2015-01-27

申请号：US13485074

申请日：2012-05-31

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F3/00 , G06F9/44 , G06F9/46 , G06F13/00

CPC分类号： G06F9/544 , G06F15/167

摘要： Point-to-point intra-nodelet messaging support for nodelets on a single chip that obey MPI semantics may be provided. In one aspect, a local buffering mechanism is employed that obeys standard communication protocols for the network communications between the nodelets integrated in a single chip. Sending messages from one nodelet to another nodelet on the same chip may be performed not via the network, but by exchanging messages in the point-to-point messaging buckets between the nodelets. The messaging buckets need not be part of the memory system of the nodelets. Specialized hardware controllers may be used for moving data between the nodelets and each messaging bucket, and ensuring correct operation of the network protocol.

摘要翻译： 可以提供在遵循MPI语义的单个芯片上的节点的点对点节点内消息支持。在一个方面，采用本地缓冲机制，其遵循集成在单个芯片中的节点之间的网络通信的标准通信协议。从同一芯片上的一个节点发送消息到另一个节点可能不是通过网络执行的，而是通过在节点之间的点对点消息存储区中交换消息。消息传递桶不需要是节点的内存系统的一部分。专用硬件控制器可用于在节点和每个消息传送桶之间移动数据，并确保网络协议的正确操作。

6.

发明申请
MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING 审中-公开
标题翻译：有效的内部/内部集体消息传递的机制

公开(公告)号：US20130007378A1

公开(公告)日：2013-01-03

申请号：US13611985

申请日：2012-09-12

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/00

CPC分类号： G06F12/0831 , G06F15/167

摘要： Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

摘要翻译： 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。集成电路管芯上的多个核被分组成多个共享存储器相干域。多个共享存储器相干域中的每一个连接到集体单元，用于在多个共享存储器相干域之间执行集合操作。

7.

发明授权
Mechanisms for efficient intra-die/intra-chip collective messaging 有权
标题翻译：有效的片内/片内集体消息传递的机制

公开(公告)号：US08990514B2

公开(公告)日：2015-03-24

申请号：US13611985

申请日：2012-09-12

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/10 , G06F12/08 , G06F15/167

CPC分类号： G06F12/0831 , G06F15/167

摘要： Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

摘要翻译： 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。集成电路管芯上的多个核被分组成多个共享存储器相干域。多个共享存储器相干域中的每一个连接到集体单元，用于在多个共享存储器相干域之间执行集合操作。

8.

发明申请
MECHANISMS FOR EFFICIENT INTRA-DIE/INTRA-CHIP COLLECTIVE MESSAGING 有权
标题翻译：有效的内部/内部集体消息传递的机制

公开(公告)号：US20120179879A1

公开(公告)日：2012-07-12

申请号：US12986528

申请日：2011-01-07

申请人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

发明人： Amith R. Mamidala , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/00

CPC分类号： G06F12/0831 , G06F15/167

摘要： Mechanism of efficient intra-die collective processing across the nodelets with separate shared memory coherency domains is provided. An integrated circuit die may include a hardware collective unit implemented on the integrated circuit die. A plurality of cores on the integrated circuit die is grouped into a plurality of shared memory coherence domains. Each of the plurality of shared memory coherence domains is connected to the collective unit for performing collective operations between the plurality of shared memory coherence domains.

摘要翻译： 提供了具有单独的共享存储器一致性域的节点之间的有效模内集体处理的机制。集成电路管芯可以包括在集成电路管芯上实现的硬件集合单元。集成电路管芯上的多个核被分组成多个共享存储器相干域。多个共享存储器相干域中的每一个连接到集体单元，用于在多个共享存储器相干域之间执行集合操作。

9.

发明授权
Using DMA for copying performance counter data to memory 失效
标题翻译：使用DMA将性能计数器数据复制到存储器

公开(公告)号：US08621167B2

公开(公告)日：2013-12-31

申请号：US13446467

申请日：2012-04-13

申请人： Alan Gara , Valentina Salapura , Robert W. Wisniewski

发明人： Alan Gara , Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F12/00

CPC分类号： G06F13/28 , G06F11/34 , G06F2201/88

摘要： A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.

摘要翻译： 用于复制性能计数器数据的设备包括将直接存储器访问（DMA）单元连接到多个硬件性能计数器和存储器设备的硬件路径。软件为DMA单元准备一个注入数据包来执行复制，而软件可以执行其他任务。在一个方面，准备注射分组的软件在收集硬件性能计数器数据的核心以外的处理核上运行。

10.

发明申请
METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM 审中-公开
标题翻译：多节点系统中分层同步障碍的方法与装置

公开(公告)号：US20120179896A1

公开(公告)日：2012-07-12

申请号：US12987523

申请日：2011-01-10

申请人： Valentina Salapura , Robert W. Wisniewski

发明人： Valentina Salapura , Robert W. Wisniewski

IPC分类号： G06F9/30

CPC分类号： G06F9/522 , G06F9/30087 , G06F9/3851

摘要： A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.

摘要翻译： 在一个方面，多处理器系统上的核心和节点的层级屏障同步可以包括：响应于达到屏障，将芯片上的多个线程中的每一个提供给寄存器中的相应位的输入比特信号; 确定所有多个线程是否通过将所述寄存器的位电一体化并将所述输入位信号“AND”到达所述障碍物; 确定是否仅需要片上同步或者是否需要节点间同步; 响应于确定芯片上的所有多个线程到达屏障，通知芯片上的多个线程，如果确定仅需要片上同步; 并且如果确定需要节点间同步，则在芯片上的所有多个线程到达屏障之后，将同步信号传送到芯片外部。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类