-
公开(公告)号:US11829440B2
公开(公告)日:2023-11-28
申请号:US17229550
申请日:2021-04-13
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
CPC classification number: G06F17/16 , G06F7/5443 , G06F9/3001 , G06F9/3016 , G06F9/30036 , G06F9/30145 , G06F9/383 , G06F9/3887 , G06N3/00
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
公开(公告)号:US11037050B2
公开(公告)日:2021-06-15
申请号:US16458020
申请日:2019-06-29
Applicant: Intel Corporation
Inventor: Krishna N. Vinod , Sujoyita Kaushikkar , Aniket S. Kakade , Kermin ChoFleming , Ping Zou , Alexey Suprun , Bhavya K. Daya
IPC: G06N3/04 , G06F7/53 , G06F1/3234 , G06F9/22
Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
-
公开(公告)号:US10984074B2
公开(公告)日:2021-04-20
申请号:US16799586
申请日:2020-02-24
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
公开(公告)号:US10572568B2
公开(公告)日:2020-02-25
申请号:US15938924
申请日:2018-03-28
Applicant: Intel Corporation
Inventor: Srinivasan Narayanamoorthy , Nadathur Rajagopalan Satish , Alexey Suprun , Kenneth J. Janik
Abstract: Disclosed embodiments relate to an accelerator for sparse-dense matrix instructions. In one example, a processor to execute a sparse-dense matrix multiplication instruction, includes fetch circuitry to fetch the sparse-dense matrix multiplication instruction having fields to specify an opcode, a dense output matrix, a dense source matrix, and a sparse source matrix having a sparsity of non-zero elements, the sparsity being less than one, decode circuitry to decode the fetched sparse-dense matrix multiplication instruction, execution circuitry to execute the decoded sparse-dense matrix multiplication instruction to, for each non-zero element at row M and column K of the specified sparse source matrix generate a product of the non-zero element and each corresponding dense element at row K and column N of the specified dense source matrix, and generate an accumulated sum of each generated product and a previous value of a corresponding output element at row M and column N of the specified dense output matrix.
-
-
-