-
公开(公告)号:US20200272596A1
公开(公告)日:2020-08-27
申请号:US16283795
申请日:2019-02-24
Applicant: INTEL CORPORATION
Inventor: Srinivasan Narayanamoorthy , Jayaram Bobba , Ankit More
IPC: G06F15/80
Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.
-
公开(公告)号:US11829440B2
公开(公告)日:2023-11-28
申请号:US17229550
申请日:2021-04-13
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
CPC classification number: G06F17/16 , G06F7/5443 , G06F9/3001 , G06F9/3016 , G06F9/30036 , G06F9/30145 , G06F9/383 , G06F9/3887 , G06N3/00
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
公开(公告)号:US11003619B2
公开(公告)日:2021-05-11
申请号:US16283795
申请日:2019-02-24
Applicant: INTEL CORPORATION
Inventor: Srinivasan Narayanamoorthy , Jayaram Bobba , Ankit More
Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.
-
公开(公告)号:US10984074B2
公开(公告)日:2021-04-20
申请号:US16799586
申请日:2020-02-24
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
公开(公告)号:US10572568B2
公开(公告)日:2020-02-25
申请号:US15938924
申请日:2018-03-28
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
-
-
-