-
公开(公告)号:US11269643B2
公开(公告)日:2022-03-08
申请号:US15482798
申请日:2017-04-09
Applicant: Intel Corporation
Inventor: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US11263141B2
公开(公告)日:2022-03-01
申请号:US17026264
申请日:2020-09-20
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K
IPC: G06F12/0877 , G06F12/0802 , G06F12/0855 , G06F12/0806 , G06F12/0846 , G06F12/0868 , G06T1/60 , G06F12/126 , G06F12/0893
Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11244420B2
公开(公告)日:2022-02-08
申请号:US17197126
申请日:2021-03-10
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (IO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.
-
公开(公告)号:US11232531B2
公开(公告)日:2022-01-25
申请号:US15690201
申请日:2017-08-29
Applicant: Intel Corporation
Inventor: Hema Chand Nalluri , Balaji Vembu , Peter Doyle , Michael Apodaca
Abstract: Various embodiments enable loop processing in a command processing block of the graphics hardware. Such hardware may include a processor including a command buffer, and a graphics command parser. The graphics command parser to load graphics commands from the command buffer, parse a first graphics command, store a loop count value associated with the first graphics command, parse a second graphics command and store a loop wrap address based on the second graphics command. The graphics command parser may execute a command sequence identified by the second graphics command, parse a third graphics command, the third graphics command identifying an end of the command sequence, set a new loop count value, and iteratively execute the command sequence using the loop wrap address based on the new loop count value.
-
305.
公开(公告)号:US20210398250A1
公开(公告)日:2021-12-23
申请号:US17239800
申请日:2021-04-26
Applicant: Intel Corporation
Inventor: Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Abhishek R. Appu , Balaji Vembu , Prasoonkumar Surti
Abstract: Systems, apparatuses and methods may provide away to blend two or more of the scene surfaces based on the focus area and an offload threshold. More particularly, systems, apparatuses and methods may provide a way to blend, by a display engine, two or more of the focus area scene surfaces and blended non-focus area scene surfaces. The systems, apparatuses and methods may include a graphics engine to render the focus area surfaces at a higher sample rate than the non-focus area scene surfaces.
-
公开(公告)号:US11169850B2
公开(公告)日:2021-11-09
申请号:US16726341
申请日:2019-12-24
Applicant: Intel Corporation
Inventor: Abhishek R Appu , Altug Koker , Balaji Vembu , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Kiran C. Veernapu , Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Eric J. Asperheim , Guei-Yuan Lueh , David Puffer , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Josh B. Mastronarde , Linda L. Hurd , Travis T. Schluessler , Tomasz Janczak , Abhishek Venkatesh , Kai Xiao , Slawomir Grajewski
Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
-
307.
公开(公告)号:US11145106B2
公开(公告)日:2021-10-12
申请号:US16381646
申请日:2019-04-11
Applicant: Intel Corporation
Inventor: Jonathan Kennedy , Gabor Liktor , Jeffery S. Boles , Slawomir Grajewski , Balaji Vembu , Travis T. Schluessler , Abhishek R. Appu , Ankur N. Shah , Joydeep Ray , Altug Koker , Jacek Kwiatkowski
IPC: G06T15/00 , A63F13/53 , A63F13/355
Abstract: Systems, apparatuses, and methods may provide for technology to process graphics data in a virtual gaming environment. The technology may identify, from graphics data in a graphics application, redundant graphics calculations relating to common frame characteristics of one or more graphical scenes to be shared between client game devices of a plurality of users and calculate, in response to the identified redundant graphics calculations, frame characteristics relating to the one or more graphical scenes. Additionally, the technology may send, over a computer network, the calculation of the frame characteristics to the client game devices.
-
公开(公告)号:US20210255857A1
公开(公告)日:2021-08-19
申请号:US17128972
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan
Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.
-
公开(公告)号:US20210217130A1
公开(公告)日:2021-07-15
申请号:US17193658
申请日:2021-03-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06K9/62 , G06N20/00 , G06F9/48 , G06F17/16 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
公开(公告)号:US11055248B2
公开(公告)日:2021-07-06
申请号:US16599261
申请日:2019-10-11
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Kiran C. Veernapu , Balaji Vembu , Vasanth Ranganathan , Prasoonkumar Surti
IPC: G06F13/40 , G06F9/54 , G06F13/42 , G06T1/60 , G06F12/084 , G06F12/0811 , G06F12/0846 , G06F12/0831
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to monitor a thread switching overhead parameter for an application executing in a processing system and in response to a determination that the thread switching overhead parameter exceeds a threshold, to activate a thread management algorithm to reduce thread switching in the processing system. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-