专利检索 ap:("Alexandre E. Eichenberger" OR "Michael K. Gschwind" OR "Valentina Salapura") AND inv:"Alexandre E. Eichenberger" 第 1 页

1.

发明申请
Vector Loads from Scattered Memory Locations 审中-公开
标题翻译：矢量负载从分散的内存位置

公开(公告)号：US20120060016A1

公开(公告)日：2012-03-08

申请号：US12876432

申请日：2010-09-07

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

IPC分类号： G06F15/76 , G06F9/02

CPC分类号： G06F9/30032 , G06F9/30018 , G06F9/30036 , G06F9/30043

摘要： Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.

摘要翻译： 提供了执行分散加载操作的机构。利用这些机制，在处理器的逻辑单元中接收收集指令，所述收集指令指定要从中将数据加载到处理器的目标向量寄存器的存储器中的多个地址。在逻辑单元内自动生成用于从存储器中的多个地址加载数据的多个单独的加载指令。多个单独的加载指令从逻辑单元发送到处理器的一个或多个加载/存储单元。对应于多个地址的数据被收集在处理器的缓冲器中。然后，逻辑单元将存储在缓冲器中的数据写入目标向量寄存器。

2.

发明授权
Vector loads with multiple vector elements from a same cache line in a scattered load operation 有权
标题翻译：在分散加载操作中，来自相同高速缓存行的多个向量元素的向量加载

公开(公告)号：US08904153B2

公开(公告)日：2014-12-02

申请号：US12876321

申请日：2010-09-07

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

IPC分类号： G06F9/345 , G06F9/30 , G06F9/38 , G06F15/80

CPC分类号： G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/3857 , G06F15/8069

摘要： Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.

摘要翻译： 提供了执行分散加载操作的机构。利用这些机制，扩展地址被接收在处理器的高速缓冲存储器中。扩展地址具有多个数据元素地址部分，其指定使用单个扩展地址来访问的多个数据元素。多个数据元素地址部分中的每一个被提供给高速缓冲存储器的相应数据元素选择器逻辑单元。高速缓冲存储器中的每个数据元素选择器逻辑单元基于提供给数据元素选择器逻辑单元的相应数据元素地址部分从高速缓存行缓冲器中选择相应的数据元素。每个数据元素选择器逻辑单元输出相应的数据元素供处理器使用。

3.

发明授权
Multi-petascale highly efficient parallel supercomputer 有权
标题翻译：多千兆高效并行超级计算机

公开(公告)号：US09081501B2

公开(公告)日：2015-07-14

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/173 , G06F9/06 , G06F15/76

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

4.

发明申请
Vector Loads with Multiple Vector Elements from a Same Cache Line in a Scattered Load Operation 有权
标题翻译：在分散负载操作中，来自相同高速缓存行的多个向量元素的向量负载

公开(公告)号：US20120060015A1

公开(公告)日：2012-03-08

申请号：US12876321

申请日：2010-09-07

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , Valentina Salapura

IPC分类号： G06F9/30 , G06F12/08

CPC分类号： G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/3857 , G06F15/8069

摘要： Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.

摘要翻译： 提供了执行分散加载操作的机构。利用这些机制，扩展地址被接收在处理器的高速缓冲存储器中。扩展地址具有多个数据元素地址部分，其指定使用单个扩展地址来访问的多个数据元素。多个数据元素地址部分中的每一个被提供给高速缓冲存储器的相应数据元素选择器逻辑单元。高速缓冲存储器中的每个数据元素选择器逻辑单元基于提供给数据元素选择器逻辑单元的相应数据元素地址部分从高速缓存线缓冲器中选择相应的数据元素。每个数据元素选择器逻辑单元输出相应的数据元素供处理器使用。

5.

发明授权
Matrix multiplication operations using pair-wise load and splat operations 有权

公开(公告)号：US09600281B2

公开(公告)日：2017-03-21

申请号：US12834464

申请日：2010-07-12

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , Valentina Salapura

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , Valentina Salapura

IPC分类号： G06F9/30 , G06F9/312 , G06F9/38

CPC分类号： G06F9/30043 , G06F9/30014 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/30145 , G06F9/3887

摘要： Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.

6.

发明申请
MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER 有权
标题翻译：多层高效平行超级计算机

公开(公告)号：US20110219208A1

公开(公告)日：2011-09-08

申请号：US13004007

申请日：2011-01-10

申请人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

发明人： Sameh Asaad , Ralph E. Bellofatto , Michael A. Blocksome , Matthias A. Blumrich , Peter Boyle , Jose R. Brunheroto , Dong Chen , Chen-Yong Cher , George L. Chiu , Norman Christ , Paul W. Coteus , Kristan D. Davis , Gabor J. Dozsa , Alexandre E. Eichenberger , Noel A. Eisley , Matthew R. Ellavsky , Kahn C. Evans , Bruce M. Fleischer , Thomas W. Fox , Alan Gara , Mark E. Giampapa , Thomas M. Gooding , Michael K. Gschwind , John A. Gunnels , Shawn A. Hall , Rudolf A. Haring , Philip Heidelberger , Todd A. Inglett , Brant L. Knudson , Gerard V. Kopcsay , Sameer Kumar , Amith R. Mamidala , James A. Marcella , Mark G. Megerian , Douglas R. Miller , Samuel J. Miller , Adam J. Muff , Michael B. Mundy , John K. O'Brien , Kathryn M. O'Brien , Martin Ohmacht , Jeffrey J. Parker , Ruth J. Poole , Joseph D. Ratterman , Valentina Salapura , David L. Satterfield , Robert M. Senger , Brian Smith , Burkhard Steinmacher-Burow , William M. Stockdell , Craig B. Stunkel , Krishnan Sugavanam , Yutaka Sugawara , Todd E. Takken , Barry M. Trager , James L. Van Oosten , Charles D. Wait , Robert E. Walkup , Alfred T. Watson , Robert W. Wisniewski , Peng Wu

IPC分类号： G06F15/76 , G06F9/06

CPC分类号： G06F13/287 , G06F9/06 , G06F9/3004 , G06F9/30047 , G06F9/3885 , G06F12/0811 , G06F12/0831 , G06F12/0862 , G06F12/0864 , G06F12/1027 , G06F15/17381 , G06F15/17387 , G06F15/76 , G06F15/8069 , G06F2212/1016 , G06F2212/602 , G06F2212/6022 , G06F2212/6024 , G06F2212/6032 , Y02D10/13 , Y02D10/14

摘要： A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

摘要翻译： 具有100 petaOPS规模计算的多Petascale高效并行超级计算机，其成本，功耗和占地面积都在降低，并且允许从互连角度来看处理节点的最大封装密度。超级计算机利用了VLSI的技术进步，实现了许多处理器可以集成到单个专用集成电路（ASIC）中的计算模型。每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC，每个处理器具有对所有系统资源的完全访问，并且使得处理器能够对诸如计算或消息传递I / O 并且优选地，根据应用内的各种算法阶段实现功能的自适应分割，或者如果I / O或其他处理器未被充分利用，则可以参与计算或通信节点通过五维环面网络互连使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。

7.

发明申请
Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations 有权
标题翻译：使用配对加载和Splat操作的矩阵乘法运算

公开(公告)号：US20120011348A1

公开(公告)日：2012-01-12

申请号：US12834464

申请日：2010-07-12

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , Valentina Salapura

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , Valentina Salapura

IPC分类号： G06F9/302

CPC分类号： G06F9/30043 , G06F9/30014 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/30145 , G06F9/3887

摘要： Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.

摘要翻译： 提供了执行矩阵乘法运算的机构。执行向量加载操作以将矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。执行成对的加载和拼接操作以加载第二向量操作数的一对标量值，并在第二目标向量寄存器内复制一对标量值。对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行操作，以生成矩阵乘法运算的部分乘积。部分产品与其他部分产品一起积累，并存储所得累积的部分产品。对于第二向量操作数的第二对标量值可以重复该操作。

8.

发明授权
Systems, methods and computer products for cross-thread scheduling 有权
标题翻译：用于跨线程调度的系统，方法和计算机产品

公开(公告)号：US09223580B2

公开(公告)日：2015-12-29

申请号：US11847556

申请日：2007-08-30

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A Gunnels , James L. McInnes , Mark P. Mendell

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A Gunnels , James L. McInnes , Mark P. Mendell

IPC分类号： G06F9/455 , G06F9/46 , G06F9/38 , G06F9/45

CPC分类号： G06F9/3851 , G06F8/445 , G06F9/3885

摘要： Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.

摘要翻译： 用于跨线程调度的系统，方法和计算机产品。示例性实施例包括用于编译代码的交叉线程调度方法，所述方法包括：响应于所述调度单元处于所述代码的非多线程部分中的调度器子操作来调度调度单元，并且调度所述调度单元，响应于调度单元处于代码的多线程部分中的线程调度器子操作。

9.

发明申请
Checkpointing in Speculative Versioning Caches 失效
标题翻译：推测版本控制缓存中的检查点

公开(公告)号：US20110047334A1

公开(公告)日：2011-02-24

申请号：US12544704

申请日：2009-08-20

申请人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind , Martin Ohmacht

发明人： Alexandre E. Eichenberger , Alan Gara , Michael K. Gschwind , Martin Ohmacht

IPC分类号： G06F12/08 , G06F12/00

CPC分类号： G06F12/0842 , G06F11/1405

摘要： Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.

摘要翻译： 提供了用于在数据处理系统的推测版本缓存中生成检查点的机制。这些机制在数据处理系统内执行代码，其中代码访问推测版本控制缓存中的高速缓存行。这些机制进一步确定是否出现指示在推测版本控制高速缓存中生成检查点的需要的第一条件。检查点是推测性高速缓存行，其响应于需要向对应于推测性高速缓存行的高速缓存行的回滚而返回的第二条件而变得不推测。这些机制还响应于确定第一个条件已经发生，在推测版本控制缓存中生成检查点。

10.

发明申请
Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture 失效
标题翻译：在高性能计算架构中使用数据预处理的复杂矩阵乘法运算

公开(公告)号：US20110040822A1

公开(公告)日：2011-02-17

申请号：US12542324

申请日：2009-08-17

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

IPC分类号： G06F17/16 , G06F7/52

CPC分类号： G06F17/16 , G06F9/30014 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30109

摘要： Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

摘要翻译： 提供了执行复矩阵乘法运算的机制。执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。第一矢量操作数包括第一复矢量值的实部和虚部。执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值，并在第二目标向量寄存器内复制第二复数向量值。第二个复矢量值具有实部和虚部。对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算，以生成复矩阵乘法运算的部分乘积。部分产品与其他部分产品一起累积，并将结果积累的部分产品存储在结果向量寄存器中。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类