System and method for re-factorizing a square matrix into lower and upper triangular matrices on a parallel processor
    1.
    发明授权
    System and method for re-factorizing a square matrix into lower and upper triangular matrices on a parallel processor 有权
    将矩阵重新归并为并行处理器中的上三角矩阵的系统和方法

    公开(公告)号:US09170836B2

    公开(公告)日:2015-10-27

    申请号:US13737287

    申请日:2013-01-09

    CPC classification number: G06F9/46 G06F17/16

    Abstract: A system and method for re-factorizing a square input matrix on a parallel processor. In one embodiment, the system includes: (1) a matrix generator operable to generate an intermediate matrix by embedding a permuted form of the input matrix in a zeroed-out sparsity pattern of a combination of lower and upper triangular matrices resulting from a prior LU factorization of a previous matrix having a same sparsity pattern, reordering to minimize fill-in and pivoting strategy as the input matrix and (2) a re-factorizer associated with the matrix generator and operable to use parallel threads to apply an incomplete-LU factorization with zero fill-in on the intermediate matrix.

    Abstract translation: 一种用于在并行处理器上重新分解矩形输入矩阵的系统和方法。 在一个实施例中,系统包括:(1)矩阵发生器,其可操作以通过将输入矩阵的置换形式嵌入由先前的LU产生的下和上三角矩阵的组合的零零稀疏模式中来生成中间矩阵 具有相同稀疏图案的先前矩阵的因式分解,重新排序以最小化作为输入矩阵的填充和枢转策略;以及(2)与矩阵生成器相关联的可重新分解器,并且可操作以使用并行线程来应用不完全LU因式分解 在中间矩阵上填零。

    SYSTEM AND METHOD FOR RE-FACTORIZING A SQUARE MATRIX INTO LOWER AND UPPER TRIANGULAR MATRICES ON A PARALLEL PROCESSOR
    2.
    发明申请
    SYSTEM AND METHOD FOR RE-FACTORIZING A SQUARE MATRIX INTO LOWER AND UPPER TRIANGULAR MATRICES ON A PARALLEL PROCESSOR 有权
    将平方矩阵重新归并为并行处理器的下三角矩阵的系统和方法

    公开(公告)号:US20140196043A1

    公开(公告)日:2014-07-10

    申请号:US13737287

    申请日:2013-01-09

    CPC classification number: G06F9/46 G06F17/16

    Abstract: A system and method for re-factorizing a square input matrix on a parallel processor. In one embodiment, the system includes: (1) a matrix generator operable to generate an intermediate matrix by embedding a permuted form of the input matrix in a zeroed-out sparsity pattern of a combination of lower and upper triangular matrices resulting from a prior LU factorization of a previous matrix having a same sparsity pattern, reordering to minimize fill-in and pivoting strategy as the input matrix and (2) a re-factorizer associated with the matrix generator and operable to use parallel threads to apply an incomplete-LU factorization with zero fill-in on the intermediate matrix.

    Abstract translation: 一种用于在并行处理器上重新分解矩形输入矩阵的系统和方法。 在一个实施例中,系统包括:(1)矩阵发生器,其可操作以通过将输入矩阵的置换形式嵌入由先前的LU产生的下和上三角矩阵的组合的零零稀疏模式中来生成中间矩阵 具有相同稀疏图案的先前矩阵的因式分解,重新排序以最小化作为输入矩阵的填充和枢转策略;以及(2)与矩阵生成器相关联的可重新分解器,并且可操作以使用并行线程来应用不完全LU因式分解 在中间矩阵上填零。

    Performing multi-convolution operations in a parallel processing system

    公开(公告)号:US10223333B2

    公开(公告)日:2019-03-05

    申请号:US14838291

    申请日:2015-08-27

    Abstract: In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch. Notably, the source locations reflect the contribution of the image tile to an output tile of an output matrix—the result of the multi-convolution operation. Subsequently, the pipeline copies data from the source locations to the image tile. Similarly, the pipeline copies data from a filter stack to a filter tile. The pipeline then performs matrix multiplication operations between the image tile and the filter tile to generate data included in the corresponding output tile. To optimize both on-chip memory usage and execution time, the pipeline creates each image tile in on-chip memory as-needed.

Patent Agency Ranking