-
公开(公告)号:US20200285471A1
公开(公告)日:2020-09-10
申请号:US16881920
申请日:2020-05-22
Applicant: Intel Corporation
Inventor: PRATIK J. ASHAR , SUPRATIM PAL , SUBRAMANIAM MAIYURAN , WEI-YU CHEN , GUEI-YUAN LUEH
Abstract: An apparatus to facilitate register sharing is disclosed. The apparatus includes one or more processors to generate first machine code having a first General Purpose Register (GRF) per thread ratio, detect an occurrence of one or more spill/fill instructions in the first machine code, and generate second machine code having a second GRF per thread ratio upon a detection of one or more spill/fill instructions in the first machine code, wherein the second GRF per thread ratio is based on a disabling of a first of a plurality of hardware threads
-
公开(公告)号:US20220156343A1
公开(公告)日:2022-05-19
申请号:US17527882
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20190324746A1
公开(公告)日:2019-10-24
申请号:US15957728
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
公开(公告)号:US20200073664A1
公开(公告)日:2020-03-05
申请号:US16120226
申请日:2018-09-01
Applicant: Intel Corporation
Inventor: PRATIK J. ASHAR , SUPRATIM PAL , SUBRAMANIAM MAIYURAN , WEI-YU CHEN , GUEI-YUAN LUEH
Abstract: An apparatus to facilitate register sharing is disclosed. The apparatus includes one or more processors to generate first machine code having a first General Purpose Register (GRF) per thread ratio, detect an occurrence of one or more spill/fill instructions in the first machine code, and generate second machine code having a second GRF per thread ratio upon a detection of one or more spill/fill instructions in the first machine code, wherein the second GRF per thread ratio is based on a disabling of a first of a plurality of hardware threads
-
5.
公开(公告)号:US20160350112A1
公开(公告)日:2016-12-01
申请号:US14726349
申请日:2015-05-29
Applicant: Intel Corporation
Inventor: SUPRATIM PAL , SUBRAMANIAM MAIYURAN , MARK C. DAVIS
CPC classification number: G06F15/82 , G06F9/30141 , G06F9/345 , G06F9/3824 , G06F9/3851 , G06F9/3887
Abstract: Techniques to suppress redundant reads to register addresses and to replicate read data are disclosed. The redundant reads are suppressed when multiple source operands specify the same register address to read. Additionally, the read data is replicated to a data stream or data location corresponding to the source operands where the data read was suppressed.
Abstract translation: 公开了抑制冗余读取以注册地址和复制读取数据的技术。 当多个源操作数指定要读取的相同寄存器地址时,冗余读取被抑制。 此外,读取的数据被复制到对应于数据读取被抑制的源操作数的数据流或数据位置。
-
公开(公告)号:US20240427847A1
公开(公告)日:2024-12-26
申请号:US18757003
申请日:2024-06-27
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
7.
公开(公告)号:US20230281272A1
公开(公告)日:2023-09-07
申请号:US18301386
申请日:2023-04-17
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30145 , G06F15/8046
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
公开(公告)号:US20150309800A1
公开(公告)日:2015-10-29
申请号:US14261097
申请日:2014-04-24
Applicant: Intel Corporation
Inventor: WEI-YU CHEN , GUEI-YUAN LUEH , SUBRAMANIAM MAIYURAN , SUPRATIM PAL
CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/30043
Abstract: A processor is described having an instruction execution pipeline. The instruction execution pipeline has an instruction fetch stage to fetch an instruction specifying multiple target resultant registers. The instruction execution pipeline has an instruction decode stage to decode the instruction. The instruction execution pipeline has a functional unit to prepare resultant content specific to each of the multiple target resultant registers. The instruction execution pipeline has a write-back stage to write back said resultant content specific to each of said multiple target resultant registers.
Abstract translation: 描述了具有指令执行流水线的处理器。 指令执行流水线具有指令提取阶段,用于获取指定多个目标结果寄存器的指令。 指令执行流水线具有解码指令的指令解码级。 指令执行流水线具有功能单元,用于准备特定于多个目标结果寄存器中的每一个的结果内容。 所述指令执行流水线具有写回阶段,用于将所述结果内容写入到所述多个目标结果寄存器中的每个上。
-
公开(公告)号:US20230297373A1
公开(公告)日:2023-09-21
申请号:US18307088
申请日:2023-04-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
CPC classification number: G06F9/3001 , G06F9/30145 , G06T1/20 , G06F9/3887 , G06F9/3802
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.
-
公开(公告)号:US20210349966A1
公开(公告)日:2021-11-11
申请号:US16913800
申请日:2020-06-26
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE PARRA , SUPRATIM PAL , ASHUTOSH GARG , SHUBRA MARWAHA , CHANDRA GURRAM , DARIN STARKEY , DURGESH BORKAR , VARGHESE GEORGE
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
-
-
-
-
-
-
-
-