-
公开(公告)号:US20210089301A1
公开(公告)日:2021-03-25
申请号:US16582406
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20240427847A1
公开(公告)日:2024-12-26
申请号:US18757003
申请日:2024-06-27
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
3.
公开(公告)号:US20230281272A1
公开(公告)日:2023-09-07
申请号:US18301386
申请日:2023-04-17
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30145 , G06F15/8046
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
公开(公告)号:US20220206795A1
公开(公告)日:2022-06-30
申请号:US17569229
申请日:2022-01-05
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20170154012A1
公开(公告)日:2017-06-01
申请号:US15431527
申请日:2017-02-13
Applicant: Intel Corporation
Inventor: VARGHESE GEORGE , SANJEEV S. JAHAGIRDAR , DEBORAH T. MARR
CPC classification number: G06F15/80 , G06F1/3206 , G06F1/3293 , G06F1/3296 , G06F9/5094 , G06F13/4022 , Y02D10/122 , Y02D10/151 , Y02D10/22
Abstract: A method is described that entails operating enabled cores of a multi-core processor such that both cores support respective software routines with a same instruction set, a first core being higher performance and consuming more power than a second core under a same set of applied supply voltage and operating frequency.
-
公开(公告)号:US20230297373A1
公开(公告)日:2023-09-21
申请号:US18307088
申请日:2023-04-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
CPC classification number: G06F9/3001 , G06F9/30145 , G06T1/20 , G06F9/3887 , G06F9/3802
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.
-
公开(公告)号:US20230029176A1
公开(公告)日:2023-01-26
申请号:US17868448
申请日:2022-07-19
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALOPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20220058853A1
公开(公告)日:2022-02-24
申请号:US17500631
申请日:2021-10-13
Applicant: Intel Corporation
Inventor: HUGUES LABBE , DARREL PALKE , SHERINE ABDELHAK , JILL BOYCE , VARGHESE GEORGE , SCOTT JANUS , ADAM LAKE , ZHIJUN LEI , ZHENGMIN LI , MIKE MACPHERSON , CARL MARSHALL , SELVAKUMAR PANNEER , PRASOONKUMAR SURTI , KARTHIK VEERAMANI , DEEPAK VEMBAR , VALLABHAJOSYULA SRINIVASA SOMAYAZULU
Abstract: One embodiment provides for a graphics processor comprising a block of graphics compute units, a graphics processor pipeline coupled to the block of graphics compute units, and a programmable neural network unit including one or more neural network hardware blocks. The programmable neural network unit is coupled with the block of graphics compute units and the graphics processor pipeline. The one or more neural network hardware blocks include hardware to perform neural network operations and activation operations for a layer of a neural network. The programmable neural network unit can configure settings of one or more hardware blocks within the graphics processor pipeline based on a machine learning model trained to optimize performance of a set of workloads.
-
公开(公告)号:US20210349966A1
公开(公告)日:2021-11-11
申请号:US16913800
申请日:2020-06-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20210303299A1
公开(公告)日:2021-09-30
申请号:US17304153
申请日:2021-06-15
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
-
-
-
-
-
-
-
-