-
公开(公告)号:US20210349848A1
公开(公告)日:2021-11-11
申请号:US17321885
申请日:2021-05-17
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALOPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210272349A1
公开(公告)日:2021-09-02
申请号:US17306769
申请日:2021-05-03
Applicant: Intel Corporation
Inventor: TRAVIS SCHLUESSLER , ZACK WATERS , MICHAEL APODACA , DANIEL JOHNSTON , JASON SURPRISE , PRASOONKUMAR SURTI , SUBRAMANIAM MAIYURAN , PETER DOYLE , SAURABH SHARMA , ANKUR SHAH , MURALI RAMADOSS
Abstract: Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
-
公开(公告)号:US20200043124A1
公开(公告)日:2020-02-06
申请号:US16543849
申请日:2019-08-19
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , JORGE F. GARCIA PABON , VIKRANTH VEMULAPALLI , CHANDRA S. GURRAM , ADITYA NAVALE , SAURABH SHARMA
Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.
-
公开(公告)号:US20190324746A1
公开(公告)日:2019-10-24
申请号:US15957728
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , GUEI-YUAN LUEH , SUPRATIM PAL , ASHUTOSH GARG , CHANDRA S. GURRAM , JORGE E. PARRA , JUNJIE GU , KONRAD TRIFUNOVIC , HONG BIN LIAO , MIKE B. MACPHERSON , SHUBH B. SHAH , SHUBRA MARWAHA , STEPHEN JUNKINS , TIMOTHY R. BAUER , VARGHESE GEORGE , WEIYU CHEN
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
25.
公开(公告)号:US20160180585A1
公开(公告)日:2016-06-23
申请号:US14581701
申请日:2014-12-23
Applicant: INTEL CORPORATION
Inventor: SUBRAMANIAM MAIYURAN , THOMAS A. PIAZZA , JORGE F. GARCIA PABON , SHUBH B. SHAH
CPC classification number: G06K9/4604 , G06T15/005 , G06T2210/12
Abstract: An apparatus and method are described for a high throughput rasterizer. For example, one embodiment of an apparatus comprises: block selection logic to select a plurality of pixel blocks associated with edges of a primitive, the plurality of pixel blocks selected based on the pixel blocks having samples which are both inside and outside of the primitive; and edge determination logic to analyze samples of the plurality of pixel blocks selected by the block selection logic and responsively generate data identifying each edge of the primitive; and final mask determination logic to combine the data identifying each edge and generate a final mask representing the primitive.
Abstract translation: 为高通量光栅化器描述了一种装置和方法。 例如,装置的一个实施例包括:块选择逻辑,用于选择与图元的边缘相关联的多个像素块,所述多个像素块基于具有在图元内部和外部的样本的像素块来选择; 以及边缘确定逻辑来分析由块选择逻辑选择的多个像素块的样本,并且响应地生成识别图元的每个边缘的数据; 以及最终掩模确定逻辑,以组合识别每个边缘的数据,并生成表示原始图案的最终掩模。
-
26.
公开(公告)号:US20220179655A1
公开(公告)日:2022-06-09
申请号:US17502492
申请日:2021-10-15
Applicant: Intel Corporation
Inventor: BUQI CHENG , WEI-YU CHEN , GUEI-YUAN LUEH , CHANDRA GURRAM , SUBRAMANIAM MAIYURAN
Abstract: Mechanisms for reducing register bank conflicts based on software hint and hardware thread switch are disclosed. In some embodiments, an apparatus for thread switching includes a graphics processing unit (GPU) that includes a plurality of register banks to store operands that are assigned at least partially to avoid register bank conflicts. A decoding circuitry checks a thread switching field of a first instruction to be executed by a first thread. The GPU performs a thread switch mechanism to cause a second instruction to be executed by a second thread when the thread switching field of the first instruction is set.
-
公开(公告)号:US20220171827A1
公开(公告)日:2022-06-02
申请号:US17527324
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , MATHEW NEVIN , JORGE PARRA , ASHUTOSH GARG , SHUBRA MARWAHA , SHUBH SHAH
Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
-
公开(公告)号:US20210089301A1
公开(公告)日:2021-03-25
申请号:US16582406
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20210073318A1
公开(公告)日:2021-03-11
申请号:US16561715
申请日:2019-09-05
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , MATHEW NEVIN , JORGE PARRA , ASHUTOSH GARG , SHUBRA MARWAHA , SHUBH SHAH
Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
-
公开(公告)号:US20200285471A1
公开(公告)日:2020-09-10
申请号:US16881920
申请日:2020-05-22
Applicant: Intel Corporation
Inventor: PRATIK J. ASHAR , SUPRATIM PAL , SUBRAMANIAM MAIYURAN , WEI-YU CHEN , GUEI-YUAN LUEH
Abstract: An apparatus to facilitate register sharing is disclosed. The apparatus includes one or more processors to generate first machine code having a first General Purpose Register (GRF) per thread ratio, detect an occurrence of one or more spill/fill instructions in the first machine code, and generate second machine code having a second GRF per thread ratio upon a detection of one or more spill/fill instructions in the first machine code, wherein the second GRF per thread ratio is based on a disabling of a first of a plurality of hardware threads
-
-
-
-
-
-
-
-
-