-
公开(公告)号:US20250068473A1
公开(公告)日:2025-02-27
申请号:US18453867
申请日:2023-08-22
Applicant: Intel Corporation
Inventor: Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , James Valerio
Abstract: Described herein is a graphics processor comprising a graphics processing cluster coupled with the memory interface, the graphics processing cluster including a plurality of processing resources, a processing resource of the plurality of processing resources including a register file including a first plurality of registers associated with a first hardware thread of a plurality of hardware threads of the processing resource and a second plurality of registers associated with a second hardware thread of the plurality of hardware threads of the processing resource and first circuitry configured to facilitate access to memory on behalf of the plurality of hardware threads and store metadata for memory access requests from the plurality of hardware threads.
-
公开(公告)号:US12189571B2
公开(公告)日:2025-01-07
申请号:US17304797
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi
Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
-
3.
公开(公告)号:US20240169021A1
公开(公告)日:2024-05-23
申请号:US18056930
申请日:2022-11-18
Applicant: Intel Corporation
Inventor: Jorge Eduardo Parra Osorio , Supratim Pal , Fangwen Fu , Guei-Yuan Lueh , Po-Yu Chen , Jiasheng Chen
CPC classification number: G06F17/16 , G06F7/5443
Abstract: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.
-
4.
公开(公告)号:US20240168764A1
公开(公告)日:2024-05-23
申请号:US18056820
申请日:2022-11-18
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Vikranth Vemulapalli , Subramaniam Maiyuran
CPC classification number: G06F9/30014 , G06F9/3867
Abstract: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.
-
公开(公告)号:US20240168723A1
公开(公告)日:2024-05-23
申请号:US18056822
申请日:2022-11-18
Applicant: Intel Corporation
Inventor: Jorge Eduardo Parra Osorio , Supratim Pal , Jiasheng Chen
Abstract: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.
-
公开(公告)号:US20220309124A1
公开(公告)日:2022-09-29
申请号:US17211627
申请日:2021-03-24
Applicant: Intel Corporation
Inventor: Chunhui Mei , Hong Jiang , Jiasheng Chen , Yongsheng Liu , Yan Li
Abstract: Matrix multiply units can take advantage of input sparsity by zero gating ALUs, which saves power consumption, but compute throughput does not increase. To improve compute throughput from sparsity, processing resources in a matrix accelerator can skip computation with zero involved in input or output. If zeros in input can be skipped, the processing units can focus calculations on generating meaningful non-zero output.
-
公开(公告)号:US20210103550A1
公开(公告)日:2021-04-08
申请号:US17122905
申请日:2020-12-15
Applicant: Intel Corporation
Inventor: Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray
Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
公开(公告)号:US20250068423A1
公开(公告)日:2025-02-27
申请号:US18453861
申请日:2023-08-22
Applicant: Intel Corporation
Inventor: Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , Vasanth Ranganathan , Guei-Yuan Lueh , James Valerio , Pradeep Golconda , Brent Schwartz , Fangwen Fu , Sabareesh Ganapathy , Peter Caday , Wei-Yu Chen , Po-Yu Chen , Timothy Bauer , Maxim Kazakov , Stanley Gambarin , Samir Pandya
Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.
-
公开(公告)号:US20250037347A1
公开(公告)日:2025-01-30
申请号:US18358297
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Supratim Pal , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Takashi Nakagawa , Guei-Yuan Lueh , Pradeep K. Golconda , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Clifford Gibson , Li-An Tang , Fangwen Fu , Kaiyu Chen , Buqi Cheng
Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.
-
公开(公告)号:US12198222B2
公开(公告)日:2025-01-14
申请号:US18532245
申请日:2023-12-07
Applicant: Intel Corporation
Inventor: Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray
IPC: G06F17/16 , G06F7/544 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/0806 , G06F15/80 , G06N3/048 , G06N3/08 , G06N3/084 , G06T1/20
Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
-
-
-
-
-
-
-
-