-
公开(公告)号:US11204801B2
公开(公告)日:2021-12-21
申请号:US16684077
申请日:2019-11-14
Applicant: Intel Corporation
Inventor: Justin DeCell , Saurabh Sharma
Abstract: Systems and methods for scheduling thread order to improve cache efficiency are disclosed. In one embodiment, a graphics processor includes processing resources and schedule and dispatch logic to schedule and dispatch threads to the processing resources. The schedule and dispatch logic is configured to receive threads, to schedule and dispatch the threads based on a forward thread dispatch having a forward thread order, and to determine whether to disable a reversing of a thread order upon completion of at least a portion of the forward thread dispatch including a completion or ending of a draw call or a dispatch.
-
公开(公告)号:US20210256653A1
公开(公告)日:2021-08-19
申请号:US17115555
申请日:2020-12-08
Applicant: Intel Corporation
Inventor: Saurabh Sharma , Michael Apodaca , Aditya Navale , Travis Schluessler , Vamsee Vardhan Chivukula , Abhishek Venkatesh , Subramaniam Maiyuran
IPC: G06T1/20
Abstract: An apparatus to facilitate asynchronous execution at a processing unit. The apparatus includes one or more processors to detect independent task passes that may be executed out of order in a pipeline of the processing unit, schedule a first set of processing tasks to be executed at a first set of processing elements at the processing unit and schedule a second set of tasks to be executed at a second set of processing elements, wherein execution of the first set of tasks at the first set of processing elements is to be performed simultaneous and in parallel to execution of the second set of tasks at the second set of processing elements.
-
公开(公告)号:US11080925B2
公开(公告)日:2021-08-03
申请号:US16456645
申请日:2019-06-28
Applicant: Intel Corporation
Inventor: Vasanth Ranganathan , Saikat Mandal , Saurabh Sharma , Vamsee Vardhan Chivukula , Karol A. Szerszen , Aleksander Olek Neyman , Altug Koker , Prasoonkumar Surti , Abhishek Appu , Joydeep Ray , Art Hunter , Luis F. Cruz Camacho , Akshay R. Chada
Abstract: Briefly, in accordance with one or more embodiments, a processor performs a coarse depth test on pixel data, and performs a final depth test on the pixel data. Coarse depth data is stored in a coarse depth cache, and per pixel depth data is stored in a per pixel depth cache. If a result of the coarse depth test is ambiguous, the processor is to read the per pixel depth data from the per pixel depth cache, and to update the coarse depth data with the per pixel depth data if the per pixel depth data has a smaller depth range than the coarse depth data.
-
公开(公告)号:US10861126B1
公开(公告)日:2020-12-08
申请号:US16449034
申请日:2019-06-21
Applicant: Intel Corporation
Inventor: Saurabh Sharma , Michael Apodaca , Aditya Navale , Travis Schluessler , Vamsee Vardhan Chivukula , Abhishek Venkatesh , Subramaniam Maiyuran
IPC: G06T1/20
Abstract: An apparatus to facilitate asynchronous execution at a processing unit. The apparatus includes one or more processors to detect independent task passes that may be executed out of order in a pipeline of the processing unit, schedule a first set of processing tasks to be executed at a first set of processing elements at the processing unit and schedule a second set of tasks to be executed at a second set of processing elements, wherein execution of the first set of tasks at the first set of processing elements is to be performed simultaneous and in parallel to execution of the second set of tasks at the second set of processing elements.
-
公开(公告)号:US10748238B2
公开(公告)日:2020-08-18
申请号:US16279270
申请日:2019-02-19
Applicant: Intel Corporation
Inventor: Saurabh Sharma , Abhishek Venkatesh , Travis T. Schluessler , Prasoonkumar Surti , Altug Koker , Aravindh V. Anantaraman , Pattabhiraman P. K. , Abhishek R. Appu , Joydeep Ray , Kamal Sinha , Vasanth Ranganathan , Bhushan M. Borole , Wenyin Fu , Eric J. Hoekstra , Linda L. Hurd
Abstract: A control surface tracks an individual cacheline in the original surface for frequent data values. If so, control surface bits are set. When reading a cacheline from memory, first the control surface bits are read. If they happen to be set, then the original memory read is skipped altogether and instead the bits from the control surface provide the value for the entire cacheline.
-
公开(公告)号:US10140678B2
公开(公告)日:2018-11-27
申请号:US15089270
申请日:2016-04-01
Applicant: INTEL CORPORATION
Inventor: Saurabh Sharma , Abhishek Ventakesh , Travis T. Schluessler , Thomas F. Raoux , Rahul P. Sathe , Jon Hasselgren
Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.
-
公开(公告)号:US20180174350A1
公开(公告)日:2018-06-21
申请号:US15386111
申请日:2016-12-21
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge F. Garcia Pabon , Vikranth Vemulapalli , Chandra S. Gurram , Aditya Navale , Saurabh Sharma
IPC: G06T15/00
Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.
-
公开(公告)号:US11710269B2
公开(公告)日:2023-07-25
申请号:US17876358
申请日:2022-07-28
Applicant: Intel Corporation
Inventor: Travis Schluessler , Zack Waters , Michael Apodaca , Daniel Johnston , Jason Surprise , Prasoonkumar Surti , Subramaniam Maiyuran , Peter Doyle , Saurabh Sharma , Ankur Shah , Murali Ramadoss
CPC classification number: G06T15/005 , G06T15/40 , G06T15/80 , G06T2210/52
Abstract: Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
-
公开(公告)号:US11640693B2
公开(公告)日:2023-05-02
申请号:US17526462
申请日:2021-11-15
Applicant: Intel Corporation
Inventor: Justin DeCell , Saurabh Sharma , Subramaniam Maiyuran , Raghavendra Miyar , Jorge Garcia Pabon
Abstract: Methods, systems and apparatuses may provide for technology that determines the size of a graphics primitive, renders pixels associated with the graphics primitive on a per tile basis if the size exceeds a threshold, and renders the pixels associated with the graphics primitive in a mesh order if the size does not exceed the threshold. In one example, the technology discards state data associated with the graphics primitive in response to a completion of rendering the pixels associated with the graphics primitive in the mesh order.
-
公开(公告)号:US11176736B2
公开(公告)日:2021-11-16
申请号:US17154174
申请日:2021-01-21
Applicant: Intel Corporation
Inventor: Justin DeCell , Saurabh Sharma , Subramaniam Maiyuran , Raghavendra Miyar , Jorge Garcia Pabon
Abstract: Methods, systems and apparatuses may provide for technology that determines the size of a graphics primitive, renders pixels associated with the graphics primitive on a per tile basis if the size exceeds a threshold, and renders the pixels associated with the graphics primitive in a mesh order if the size does not exceed the threshold. In one example, the technology discards state data associated with the graphics primitive in response to a completion of rendering the pixels associated with the graphics primitive in the mesh order.
-
-
-
-
-
-
-
-
-