Method and structure for algorithmic overlap in parallel processing for exploitation when load imbalance is dynamic and predictable
    3.
    发明申请
    Method and structure for algorithmic overlap in parallel processing for exploitation when load imbalance is dynamic and predictable 审中-公开
    并行处理中的算法重叠的方法和结构,当负载不平衡是动态和可预测的时候

    公开(公告)号:US20060167836A1

    公开(公告)日:2006-07-27

    申请号:US11039907

    申请日:2005-01-24

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F9/5083

    摘要: A method (and structure) of processing, on a computer having a plurality of processors, includes executing a set of tasks that includes a computational bottleneck in a repetitive procedure on a first subset of the plurality of processors. A set of non-bottleneck tasks of the repetitive procedure is executed on a second subset of the plurality of processors. In a steady-state processing of the repetitive procedure, the first subset of processors and the second subset of processors are together processing the repetitive procedure in a manner such that the first subset of processors and the second subset of processors are each operating substantially full-time.

    摘要翻译: 在具有多个处理器的计算机上进行处理的方法(和结构)包括在多个处理器的第一子集上的重复过程中执行包括计算瓶颈的一组任务。 在多个处理器的第二子集上执行重复过程的一组非瓶颈任务。 在重复过程的稳态处理中,处理器的第一子集和处理器的第二子集合共同地以这样的方式处理重复过程,使得处理器的第一子集和处理器的第二子集各自基本上全部运行, 时间。

    Method and structure for improving processing efficiency in parallel processing machines for rectangular and triangular matrix routines
    5.
    发明申请
    Method and structure for improving processing efficiency in parallel processing machines for rectangular and triangular matrix routines 审中-公开
    用于提高矩形和三角矩阵程序的并行处理机的处理效率的方法和结构

    公开(公告)号:US20060265445A1

    公开(公告)日:2006-11-23

    申请号:US11133254

    申请日:2005-05-20

    IPC分类号: G06F7/32

    CPC分类号: G06F17/16

    摘要: A computerized method (and structure) of linear algebra processing on a computer having a plurality of processors for parallel processing, includes, for a matrix having elements originally stored in a memory in a rectangular matrix AR or especially of one of a triangular matrix AT format and a symmetric matrix AS format, distributing data of the rectangular AR or triangular or symmetric matrix (AT, AS) from the memory to the plurality of processors in such a manner that keeps all submatrices of AR or substantially only essential data of the triangular matrix AT or symmetric matrix AS is represented in the distributed memories of the processors as contiguous atomic units for the processing. The linear algebra processing done on the processors with distributed memories requires that submatrices be sent and received as contiguous atomic units based on the prescribed block cyclic data layouts of the linear algebra processing. This computerized method (and structure) defines all of its submatrices as these contiguous atomic units, thereby avoiding extra data preparation before each send and after each receive. The essential data or AT or AS is that data of the triangular or symmetric matrix that is minimally necessary for maintaining the full information content of the triangular AT or symmetric matrix AS.

    摘要翻译: 在具有用于并行处理的多个处理器的计算机上的线性代数处理的计算机化方法(和结构)包括:具有原始存储在矩形矩阵AR或特别是三角矩阵AT格式之一的存储器中的元素的矩阵 以及对称矩阵AS格式,将矩阵AR或三角形或对称矩阵(AT,AS)的数据从存储器分配到多个处理器,使得AR的所有子矩阵或基本上只有三角矩阵的基本数据 AT或对称矩阵AS在处理器的分布式存储器中被表示为用于处理的连续原子单元。 在具有分布式存储器的处理器上进行的线性代数处理需要基于线性代数处理的规定块循环数据布局将子矩阵作为连续原子单元发送和接收。 该计算机化方法(和结构)将其所有子矩阵定义为这些连续的原子单元,从而避免在每次发送之前和之后每次接收时额外的数据准备。 基本数据或AT或AS是维持三角形AT或对称矩阵AS的完整信息内容所需的三角形或对称矩阵的数据。

    Method and structure for a generalized cache-register file interface with data restructuring methods for multiple cache levels and hardware pre-fetching
    6.
    发明申请
    Method and structure for a generalized cache-register file interface with data restructuring methods for multiple cache levels and hardware pre-fetching 审中-公开
    广义缓存寄存器文件接口的方法和结构,具有用于多个高速缓存级别和硬件预取的数据重组方法

    公开(公告)号:US20060161612A1

    公开(公告)日:2006-07-20

    申请号:US11035902

    申请日:2005-01-14

    IPC分类号: G06F7/38

    摘要: A method and structure for executing a matrix algorithm requiring an order of N3 operations including data reformatting operations, where N is a dimension of an operand of said algorithm on a computer, includes initially reformatting data for at least one matrix used in the matrix algorithm into a data structure stored in a memory, such that stride one data is presented for all submatrices used as operands involved in the matrix algorithm in a format required by the matrix algorithm with substantially no further data re-formatting beyond an order N data re-formatting required for executing the algorithm.

    摘要翻译: 一种用于执行需要包括数据重新格式化操作的N次序操作的矩阵算法的方法和结构,其中N是计算机上的所述算法的操作数的维度,包括至少重新格式化数据至少 将矩阵算法中使用的一个矩阵转换为存储在存储器中的数据结构,从而以矩阵算法所要求的格式,以用作矩阵算法中所涉及的操作数的所有子矩阵,呈现一个数据,基本上不再进一步进行数据重新格式化 超出执行算法所需的N次数据重新格式化。

    System and method for algorithmic cache-bypass
    7.
    发明申请
    System and method for algorithmic cache-bypass 审中-公开
    用于算法缓存旁路的系统和方法

    公开(公告)号:US20060179240A1

    公开(公告)日:2006-08-10

    申请号:US11052877

    申请日:2005-02-09

    IPC分类号: G06F13/28

    CPC分类号: G06F12/0897 G06F12/0888

    摘要: A system for (and method of) algorithmic cache-bypass which includes acting on at least one level of cache to at least one of bypass the at least one level of cache, stream through the at least one level of cache, force utilization of at least one other level of cache, bypass at least one level of cache, bypass all levels of cache, force utilization of a main memory, and force utilization of an out-of core memory.

    摘要翻译: 一种用于(和)方法的算法高速缓存绕过系统,其包括对至少一个级别的缓存执行至少一个旁路至少一级的缓存,流过所述至少一级缓存,强制利用at 至少一个其他级别的缓存,绕过至少一个级别的缓存,绕过所有级别的高速缓存,强制利用主内存,以及强制利用核心内存。

    Method and structure for producing high performance linear algebra routines using register block data format routines
    9.
    发明申请
    Method and structure for producing high performance linear algebra routines using register block data format routines 失效
    使用寄存器块数据格式例程生成高性能线性代数程序的方法和结构

    公开(公告)号:US20050071409A1

    公开(公告)日:2005-03-31

    申请号:US10671888

    申请日:2003-09-29

    IPC分类号: G06F12/00 G06F12/08 G06F17/16

    CPC分类号: G06F12/0875 G06F17/16

    摘要: A method (and structure) of executing a matrix operation, includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks is stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A.

    摘要翻译: 执行矩阵运算的方法(和结构)包括对于矩阵A,将矩阵A分成块,每个块具有大小p-by-q。 然后以p-by-q的大小的块以以下两种方式中的至少一种存储在高速缓存或存储器中。 至少一个块中的元素以块的元素占据与块中的原始位置不同的位置的格式存储,和/或大小为p-by-q的块以 其中至少一个块占据与矩阵A中其原始位置不同的位置。

    Method and structure for a hybrid full-packed storage format as a single rectangular format data structure
    10.
    发明申请
    Method and structure for a hybrid full-packed storage format as a single rectangular format data structure 审中-公开
    作为单个矩形格式数据结构的混合全封装存储格式的方法和结构

    公开(公告)号:US20060173947A1

    公开(公告)日:2006-08-03

    申请号:US11045354

    申请日:2005-01-31

    IPC分类号: G06F7/52

    CPC分类号: G06F17/16

    摘要: A method (and structure) of linear algebra processing, includes processing a (real or complex) matrix data having elements originally stored in one of a triangular format and a symmetric matrix format in a subroutine designed to process matrix data in a full format. The processing uses a hybrid full packed data structure, which provides a rectangular space characteristic of the full format. The rectangular space is defined by a leading dimension (LD). Inside of the rectangular space are stored a plurality of entities that include all elements of the matrix data originally stored in the triangular or symmetric format.

    摘要翻译: 线性代数处理的方法(和结构)包括在设计成以全格式处理矩阵数据的子程序中处理具有原始存储在三角形格式和对称矩阵格式之一的元素的(实数或复数)矩阵数据。 该处理使用混合全包数据结构,其提供了完整格式的矩形空间特征。 矩形空间由前导维(LD)定义。 在矩形空间的内部存储多个实体,其包括原始以三角形或对称格式存储的矩阵数据的所有元素。