Patent search ap:("Intel Corporation") AND inv:"Jorge E. Parra" Page 2

11.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11314515B2

公开(公告)日：2022-04-26

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

12.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11042370B2

公开(公告)日：2021-06-22

申请号：US15957728

申请日：2018-04-19

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. Macpherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC: G06F9/30 , G06T1/20 , G06F9/38

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

13.

发明授权
Register sharing mechanism 有权

公开(公告)号：US10983794B2

公开(公告)日：2021-04-20

申请号：US16443285

申请日：2019-06-17

Applicant: Intel Corporation

Inventor： Guei-Yuan Lueh , Subramaniam Maiyuran , Weiyu Chen , Konrad Trifunovic , Supratim Pal , Chandra S. Gurram , Jorge E. Parra , Pratik J. Ashar , Tomasz Bujewski

IPC: G06F9/30 , G06F9/54 , G06F9/48 , G06F12/1009 , G06F9/50

Abstract: An processor to facilitate register sharing is disclosed. The processor includes a plurality of execution units (EUs), each including a General Purpose Register File (GRF) having a plurality of registers; and register sharing hardware to divide the plurality of registers into a first set of registers dedicated for execution of a first set of threads and a second set of registers shared for execution of a second set of threads.

14.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11900502B2

公开(公告)日：2024-02-13

申请号：US17734983

申请日：2022-05-02

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

15.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20220326953A1

公开(公告)日：2022-10-13

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

16.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11321799B2

公开(公告)日：2022-05-03

申请号：US16726659

申请日：2019-12-24

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

17.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20210191724A1

公开(公告)日：2021-06-24

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

18.

发明授权
Software scoreboard information and synchronization 有权

公开(公告)号：US10692170B2

公开(公告)日：2020-06-23

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06F9/38 , G06F8/41 , G06T1/20 , G06F9/30 , G06T1/60 , G09G5/36 , G06T15/00

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

19.

发明申请
SOFTWARE SCOREBOARD INFORMATION AND SYNCHRONIZATION 审中-公开

公开(公告)号：US20190362460A1

公开(公告)日：2019-11-28

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F8/41

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification