-
公开(公告)号:US20240053985A1
公开(公告)日:2024-02-15
申请号:US18485089
申请日:2023-10-11
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
CPC classification number: G06F9/3001 , G06F9/5011 , G06F17/16 , G06F9/3013 , G06F9/30036
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20220156343A1
公开(公告)日:2022-05-19
申请号:US17527882
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20190324746A1
公开(公告)日:2019-10-24
申请号:US15957728
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
公开(公告)号:US20240427847A1
公开(公告)日:2024-12-26
申请号:US18757003
申请日:2024-06-27
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
5.
公开(公告)号:US20230281272A1
公开(公告)日:2023-09-07
申请号:US18301386
申请日:2023-04-17
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30145 , G06F15/8046
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
公开(公告)号:US20220206795A1
公开(公告)日:2022-06-30
申请号:US17569229
申请日:2022-01-05
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20230297373A1
公开(公告)日:2023-09-21
申请号:US18307088
申请日:2023-04-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
CPC classification number: G06F9/3001 , G06F9/30145 , G06T1/20 , G06F9/3887 , G06F9/3802
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.
-
公开(公告)号:US20210349966A1
公开(公告)日:2021-11-11
申请号:US16913800
申请日:2020-06-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20210303299A1
公开(公告)日:2021-09-30
申请号:US17304153
申请日:2021-06-15
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
公开(公告)号:US20220171827A1
公开(公告)日:2022-06-02
申请号:US17527324
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , MATHEW NEVIN , JORGE PARRA , ASHUTOSH GARG , SHUBRA MARWAHA , SHUBH SHAH
Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
-
-
-
-
-
-
-
-
-