专利检索 ap:("Alexandre E. Eichenberger" OR "Michael K. Gschwind" OR "John A. Gunnels" OR "Valentina Salapura") AND inv:"John A. Gunnels" 第 2 页

11.

发明申请
SYSTEMS, METHODS AND COMPUTER PRODUCTS FOR CROSS-THREAD SCHEDULING 有权
标题翻译：用于交叉螺纹调度的系统，方法和计算机产品

公开(公告)号：US20090064152A1

公开(公告)日：2009-03-05

申请号：US11847556

申请日：2007-08-30

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , James L. McInnes , Mark P. Mendell

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels , James L. McInnes , Mark P. Mendell

IPC分类号： G06F9/46

CPC分类号： G06F9/3851 , G06F8/445 , G06F9/3885

摘要： Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.

摘要翻译： 用于跨线程调度的系统，方法和计算机产品。示例性实施例包括用于编译代码的交叉线程调度方法，所述方法包括：响应于所述调度单元处于所述代码的非多线程部分中的调度器子操作来调度调度单元，并且调度所述调度单元，响应于调度单元处于代码的多线程部分中的线程调度器子操作。

12.

发明授权
Optimized scalar promotion with load and splat SIMD instructions 失效
标题翻译：通过加载和拼接SIMD指令优化标量升级

公开(公告)号：US08572586B2

公开(公告)日：2013-10-29

申请号：US13555435

申请日：2012-07-23

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

IPC分类号： G06F9/30

CPC分类号： G06F8/45

摘要： Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

摘要翻译： 提供了在单指令多数据（SIMD）引擎上执行的优化标量代码的机制。可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

13.

发明授权
Optimized scalar promotion with load and splat SIMD instructions 失效
标题翻译：通过加载和拼接SIMD指令优化标量升级

公开(公告)号：US08255884B2

公开(公告)日：2012-08-28

申请号：US12134495

申请日：2008-06-06

申请人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

发明人： Alexandre E. Eichenberger , Michael K. Gschwind , John A. Gunnels

IPC分类号： G06F9/45 , G06F9/44

CPC分类号： G06F8/45

摘要： Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

摘要翻译： 提供了在单指令多数据（SIMD）引擎上执行的优化标量代码的机制。可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

14.

发明申请
METHOD AND APPARATUS FOR ALLOCATING ARCHITECTURAL REGISTER RESOURCES AMONG THREADS IN A MULTI-THREADED MICROPROCESSOR CORE 审中-公开
标题翻译：用于在多路径微处理器核心中分配螺纹结构中的结构寄存器资源的方法和装置

公开(公告)号：US20090100249A1

公开(公告)日：2009-04-16

申请号：US11869838

申请日：2007-10-10

申请人： ALEXANDRE E. EICHENBERGER , Michael Karl Gschwind , John A. Gunnels

发明人： ALEXANDRE E. EICHENBERGER , Michael Karl Gschwind , John A. Gunnels

IPC分类号： G06F9/38 , G06F9/02

CPC分类号： G06F9/3013 , G06F9/3012 , G06F9/30123 , G06F9/3851

摘要： One embodiment of a microprocessor core capable of executing a plurality of threads substantially simultaneously includes a plurality of register resources available for use by the threads, where the register resources are fewer in number than the number threads multiplied by a number of architectural register resources required per thread, and a supervisor for allocating the register resources among the plurality of threads.

摘要翻译： 能够基本上同时执行多个线程的微处理器核心的一个实施例基本上同时包括可用于线程使用的多个寄存器资源，其中寄存器资源的数量少于数量线索乘以每个所需的架构寄存器资源数量线程和用于在多个线程之间分配寄存器资源的管理器。

15.

发明申请
Shared Prefetching to Reduce Execution Skew in Multi-Threaded Systems 失效
标题翻译：共享预取以减少多线程系统中的执行偏差

公开(公告)号：US20110276786A1

公开(公告)日：2011-11-10

申请号：US12773454

申请日：2010-05-04

申请人： Alexandre E. Eichenberger , John A. Gunnels

发明人： Alexandre E. Eichenberger , John A. Gunnels

IPC分类号： G06F9/30 , G06F12/08 , G06F12/00

CPC分类号： G06F12/0862 , G06F8/4442 , G06F9/30047 , G06F9/383 , G06F9/3851 , G06F2212/6028

摘要： Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.

摘要翻译： 提供了用于优化代码以执行数据预取到由计算设备上执行的多个线程共享的计算设备的共享存储器的机制。识别由多个线程共享的代码的一部分的存储器流。一组预取指令分布在多个线程上。预取指令被插入到多个线程的指令序列中，使得每个指令序列具有预取指令集合的单独的子部分，从而生成优化的代码。可执行代码基于优化的代码生成并存储在存储设备中。执行的可执行代码在多个线程中以共享的方式执行与分布式预取指令集相关联的预取。

16.

发明授权
Method and structure of using SIMD vector architectures to implement matrix multiplication 失效
标题翻译：使用SIMD矢量架构实现矩阵乘法的方法和结构

公开(公告)号：US08458442B2

公开(公告)日：2013-06-04

申请号：US12548129

申请日：2009-08-26

申请人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson

发明人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson

IPC分类号： G06F15/00 , G06F15/76

CPC分类号： G06F9/3881 , G06F9/3001 , G06F9/30032 , G06F9/30036 , G06F9/3877 , G06F17/16

摘要： A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=−1.

摘要翻译： 一种包括多个协处理单元和使用复合加载指令选择性地加载用于处理多个协处理单元的数据的控制器的结构（和方法）。复合加载指令包括多个低级软件指令，其以预定的方式预先处理输入数据，以模拟单个硬件加载指令的效果，该硬件加载指令将通过根据效果加载输入数据来提供复合矩阵数据的最佳加载乘以i·i = -1。

17.

发明授权
Shared prefetching to reduce execution skew in multi-threaded systems 失效
标题翻译：共享预取以减少多线程系统中的执行偏斜

公开(公告)号：US08490071B2

公开(公告)日：2013-07-16

申请号：US12773454

申请日：2010-05-04

申请人： Alexandre E. Eichenberger , John A. Gunnels

发明人： Alexandre E. Eichenberger , John A. Gunnels

IPC分类号： G06F9/45

CPC分类号： G06F12/0862 , G06F8/4442 , G06F9/30047 , G06F9/383 , G06F9/3851 , G06F2212/6028

摘要： Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.

摘要翻译： 提供了用于优化代码以执行数据预取到由计算设备上执行的多个线程共享的计算设备的共享存储器的机制。识别由多个线程共享的代码的一部分的存储器流。一组预取指令分布在多个线程上。预取指令被插入到多个线程的指令序列中，使得每个指令序列具有预取指令集合的单独的子部分，从而生成优化的代码。可执行代码基于优化的代码生成并存储在存储设备中。执行的可执行代码在多个线程中以共享的方式执行与分布式预取指令集相关联的预取。

18.

发明申请
METHOD AND STRUCTURE OF USING SIMD VECTOR ARCHITECTURES TO IMPLEMENT MATRIX MULTIPLICATION 失效
标题翻译：使用SIMD VECTOR架构实现矩阵多项式的方法和结构

公开(公告)号：US20110055517A1

公开(公告)日：2011-03-03

申请号：US12548129

申请日：2009-08-26

申请人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson

发明人： Alexandre E. Eichenberger , Michael Karl Gschwind , John A. Gunnels , Fred Gehrung Gustavson , Brett Olsson

IPC分类号： G06F15/76 , G06F9/06

CPC分类号： G06F9/3881 , G06F9/3001 , G06F9/30032 , G06F9/30036 , G06F9/3877 , G06F17/16

摘要： A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=−1.

摘要翻译： 一种包括多个协处理单元和使用复合加载指令选择性地加载用于处理多个协处理单元的数据的控制器的结构（和方法）。复合加载指令包括多个低级软件指令，其以预定的方式预先处理输入数据，以模拟单个硬件加载指令的效果，该硬件加载指令将通过根据效果加载输入数据来提供复合矩阵数据的最佳加载乘以i·i = -1。

19.

发明授权
Method and structure for producing high performance linear algebra routines using composite blocking based on L1 cache size 失效

公开(公告)号：US08527571B2

公开(公告)日：2013-09-03

申请号：US12341718

申请日：2008-12-22

申请人： Fred Gehrung Gustavson , John A. Gunnels

发明人： Fred Gehrung Gustavson , John A. Gunnels

IPC分类号： G06F7/38

CPC分类号： G06F17/16

摘要： A method (and structure) for performing a matrix subroutine, includes storing data for a matrix subroutine call in a computer memory in an increment block size that is based on a cache size.

20.

发明申请
Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems 审中-公开
标题翻译：多核处理系统中算法任务和动态参数化的性能评估

公开(公告)号：US20090144745A1

公开(公告)日：2009-06-04

申请号：US11947185

申请日：2007-11-29

申请人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton

发明人： John A. Gunnels , Shakti Kapoor , Ravi Kothari , Yogish Sabharwal , James C. Sexton

IPC分类号： G06F9/50 , G06F9/44

CPC分类号： G06F11/3404 , G06F11/3428 , G06F11/3433 , G06F11/3447

摘要： Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

摘要翻译： 用于评估目标多核处理系统上基于DMA的算法任务的性能的装置包括存储器和耦合到存储器的至少一个处理器。处理器是可操作的：输入指定任务的模板，该模板包括指定DMA操作的DMA相关参数和要执行的计算操作; 通过在目标多核处理系统上运行基准测试来评估指定任务的性能，该基准测试用于使用DMA操作生成数据访问模式，并调用由输入模板指定的规定的计算例程; 并提供表示与目标多核处理系统相对应的指定任务的性能度量的基准测试结果。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类