Patent search ap:("Intel Corporation") AND inv:"VARGHESE GEORGE" Page 1

1.

发明申请
SHARING REGISTER FILE USAGE BETWEEN FUSED PROCESSING RESOURCES 有权

公开(公告)号：US20210089301A1

公开(公告)日：2021-03-25

申请号：US16582406

申请日：2019-09-25

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA

IPC: G06F9/30 , G06F17/16 , G06F9/50

Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

2.

发明申请
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS 有权

公开(公告)号：US20240427847A1

公开(公告)日：2024-12-26

申请号：US18757003

申请日：2024-06-27

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE

IPC: G06F17/16 , G06F9/30 , G06F15/80

Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.

3.

发明公开
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS 审中-公开

公开(公告)号：US20230281272A1

公开(公告)日：2023-09-07

申请号：US18301386

申请日：2023-04-17

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE

IPC: G06F17/16 , G06F9/30 , G06F15/80

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30145 , G06F15/8046

Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.

4.

发明申请
SHARING REGISTER FILE USAGE BETWEEN FUSED PROCESSING RESOURCES 有权

公开(公告)号：US20220206795A1

公开(公告)日：2022-06-30

申请号：US17569229

申请日：2022-01-05

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA

IPC: G06F9/30 , G06F9/50 , G06F17/16

Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

5.

发明申请
ASYMMETRIC PERFORMANCE MULTICORE ARCHITECTURE WITH SAME INSTRUCTION SET ARCHITECTURE 审中-公开

公开(公告)号：US20170154012A1

公开(公告)日：2017-06-01

申请号：US15431527

申请日：2017-02-13

Applicant: Intel Corporation

Inventor： VARGHESE GEORGE , SANJEEV S. JAHAGIRDAR , DEBORAH T. MARR

IPC: G06F15/80 , G06F1/32 , G06F13/40

CPC classification number: G06F15/80 , G06F1/3206 , G06F1/3293 , G06F1/3296 , G06F9/5094 , G06F13/4022 , Y02D10/122 , Y02D10/151 , Y02D10/22

Abstract: A method is described that entails operating enabled cores of a multi-core processor such that both cores support respective software routines with a same instruction set, a first core being higher performance and consuming more power than a second core under a same set of applied supply voltage and operating frequency.

6.

发明公开
INSTRUCTION AND LOGIC FOR SYSTOLIC DOT PRODUCT WITH ACCUMULATE 审中-公开

公开(公告)号：US20230297373A1

公开(公告)日：2023-09-21

申请号：US18307088

申请日：2023-04-26

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN

IPC: G06F9/30 , G06T1/20 , G06F9/38

CPC classification number: G06F9/3001 , G06F9/30145 , G06T1/20 , G06F9/3887 , G06F9/3802

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.

7.

发明申请
SCALAR CORE INTEGRATION 有权

公开(公告)号：US20230029176A1

公开(公告)日：2023-01-26

申请号：US17868448

申请日：2022-07-19

Applicant: Intel Corporation

Inventor： JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALOPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM

IPC: G06F15/80 , G06F9/30 , G06F9/38 , G06T15/00

Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.

8.

发明申请
GRAPHICS ARCHITECTURE INCLUDING A NEURAL NETWORK PIPELINE 有权

公开(公告)号：US20220058853A1

公开(公告)日：2022-02-24

申请号：US17500631

申请日：2021-10-13

Applicant: Intel Corporation

Inventor： HUGUES LABBE , DARREL PALKE , SHERINE ABDELHAK , JILL BOYCE , VARGHESE GEORGE , SCOTT JANUS , ADAM LAKE , ZHIJUN LEI , ZHENGMIN LI , MIKE MACPHERSON , CARL MARSHALL , SELVAKUMAR PANNEER , PRASOONKUMAR SURTI , KARTHIK VEERAMANI , DEEPAK VEMBAR , VALLABHAJOSYULA SRINIVASA SOMAYAZULU

IPC: G06T15/00 , G06N3/08 , G06T17/20 , G06T1/60 , G06T15/40 , G06T1/20

Abstract: One embodiment provides for a graphics processor comprising a block of graphics compute units, a graphics processor pipeline coupled to the block of graphics compute units, and a programmable neural network unit including one or more neural network hardware blocks. The programmable neural network unit is coupled with the block of graphics compute units and the graphics processor pipeline. The one or more neural network hardware blocks include hardware to perform neural network operations and activation operations for a layer of a neural network. The programmable neural network unit can configure settings of one or more hardware blocks within the graphics processor pipeline based on a machine learning model trained to optimize performance of a set of workloads.

9.

发明申请
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS 有权

公开(公告)号：US20210349966A1

公开(公告)日：2021-11-11

申请号：US16913800

申请日：2020-06-26

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE

IPC: G06F17/16 , G06F9/30 , G06F15/80

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

10.

发明申请
INSTRUCTION AND LOGIC FOR SYSTOLIC DOT PRODUCT WITH ACCUMULATE 有权

公开(公告)号：US20210303299A1

公开(公告)日：2021-09-30

申请号：US17304153

申请日：2021-06-15

Applicant: Intel Corporation

Inventor： SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN

IPC: G06F9/30 , G06T1/20 , G06F9/38

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification