SYSTOLIC ARRAY ACCELERATOR SYSTEMS AND METHODS

    公开(公告)号:US20200272596A1

    公开(公告)日:2020-08-27

    申请号:US16283795

    申请日:2019-02-24

    Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

    Systolic array accelerator systems and methods

    公开(公告)号:US11003619B2

    公开(公告)日:2021-05-11

    申请号:US16283795

    申请日:2019-02-24

    Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

    Accelerator for sparse-dense matrix multiplication

    公开(公告)号:US10984074B2

    公开(公告)日:2021-04-20

    申请号:US16799586

    申请日:2020-02-24

    Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.

    Accelerator for sparse-dense matrix multiplication

    公开(公告)号:US10572568B2

    公开(公告)日:2020-02-25

    申请号:US15938924

    申请日:2018-03-28

    Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.

Patent Agency Ranking