Patent search ap:("Intel Corporation") AND inv:"Jiasheng Chen" Page 1

1.

发明申请
DISTRIBUTED REGISTER FILE CACHE TO REDUCE L1 BANDWIDTH REQUIREMENTS 有权

公开(公告)号：US20250068473A1

公开(公告)日：2025-02-27

申请号：US18453867

申请日：2023-08-22

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , James Valerio

IPC: G06F9/50 , G06F9/30

Abstract: Described herein is a graphics processor comprising a graphics processing cluster coupled with the memory interface, the graphics processing cluster including a plurality of processing resources, a processing resource of the plurality of processing resources including a register file including a first plurality of registers associated with a first hardware thread of a plurality of hardware threads of the processing resource and a second plurality of registers associated with a second hardware thread of the plurality of hardware threads of the processing resource and first circuitry configured to facilitate access to memory on behalf of the plurality of hardware threads and store metadata for memory access requests from the plurality of hardware threads.

2.

发明授权
Dual pipeline parallel systolic array 有权

公开(公告)号：US12189571B2

公开(公告)日：2025-01-07

申请号：US17304797

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC: G06F15/80 , G06F9/30 , G06F9/38

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

3.

发明公开
ENHANCEMENTS FOR ACCUMULATOR USAGE AND INSTRUCTION FORWARDING IN MATRIX MULTIPLY PIPELINE IN GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240169021A1

公开(公告)日：2024-05-23

申请号：US18056930

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Supratim Pal , Fangwen Fu , Guei-Yuan Lueh , Po-Yu Chen , Jiasheng Chen

IPC: G06F17/16 , G06F7/544

CPC classification number: G06F17/16 , G06F7/5443

Abstract: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.

4.

发明公开
SUPPORTING AND LOAD BALANCING MULTIPLE DOUBLE PRECISION PIPELINES IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240168764A1

公开(公告)日：2024-05-23

申请号：US18056820

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Jiasheng Chen , Vikranth Vemulapalli , Subramaniam Maiyuran

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30014 , G06F9/3867

Abstract: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.

5.

发明公开
MATRIX TRANSPOSITION IN MATRIX MULTIPLICATION ARRAY CIRCUITRY 审中-公开

公开(公告)号：US20240168723A1

公开(公告)日：2024-05-23

申请号：US18056822

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Supratim Pal , Jiasheng Chen

IPC: G06F7/78 , G06F17/16

CPC classification number: G06F7/78 , G06F17/16

Abstract: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.

6.

发明申请
RANDOM SPARSITY HANDLING IN A SYSTOLIC ARRAY 有权

公开(公告)号：US20220309124A1

公开(公告)日：2022-09-29

申请号：US17211627

申请日：2021-03-24

Applicant: Intel Corporation

Inventor： Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li

IPC: G06F17/16 , G06F17/11 , G06F15/80 , G06F7/544 , G06F9/30

Abstract: Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.

7.

发明申请
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY 有权

公开(公告)号：US20210103550A1

公开(公告)日：2021-04-08

申请号：US17122905

申请日：2020-12-15

Applicant: Intel Corporation

Inventor： Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray

IPC: G06F15/80 , G06F7/544 , G06F9/50 , G06F17/16 , G06N3/08 , G06N3/04

Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

8.

发明申请
INSTRUCTION ENCODING TO IMPLEMENT INCREASED REGISTER CAPACITY PER THREAD 有权

公开(公告)号：US20250068423A1

公开(公告)日：2025-02-27

申请号：US18453861

申请日：2023-08-22

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , Vasanth Ranganathan , Guei-Yuan Lueh , James Valerio , Pradeep Golconda , Brent Schwartz , Fangwen Fu , Sabareesh Ganapathy , Peter Caday , Wei-Yu Chen , Po-Yu Chen , Timothy Bauer , Maxim Kazakov , Stanley Gambarin , Samir Pandya

IPC: G06F9/30 , G06F9/38

Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.

9.

发明申请
32-BIT CHANNEL-ALIGNED INTEGER MULTIPLICATION VIA MULTIPLE MULTIPLIERS PER-CHANNEL 有权

公开(公告)号：US20250037347A1

公开(公告)日：2025-01-30

申请号：US18358297

申请日：2023-07-25

Applicant: Intel Corporation

Inventor： Jiasheng Chen , Supratim Pal , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Takashi Nakagawa , Guei-Yuan Lueh , Pradeep K. Golconda , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Clifford Gibson , Li-An Tang , Fangwen Fu , Kaiyu Chen , Buqi Cheng

IPC: G06T15/00 , G06F9/30

Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.

10.

发明授权
Architecture for block sparse operations on a systolic array 有权

公开(公告)号：US12198222B2

公开(公告)日：2025-01-14

申请号：US18532245

申请日：2023-12-07

Applicant: Intel Corporation

Inventor： Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray

IPC: G06F17/16 , G06F7/544 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/0806 , G06F15/80 , G06N3/048 , G06N3/08 , G06N3/084 , G06T1/20

Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification