-
251.
公开(公告)号:US20190139182A1
公开(公告)日:2019-05-09
申请号:US16197783
申请日:2018-11-21
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/04 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/08 , G06N3/084
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
-
公开(公告)号:US10282811B2
公开(公告)日:2019-05-07
申请号:US15482685
申请日:2017-04-07
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Altug Koker , Balaji Vembu
IPC: G09G5/36 , G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F12/0875 , G06T1/60
Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, one embodiment of an apparatus comprises: a processor comprising one or more cores to execute instructions and process data, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory coupled to the GPU, the GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication, for each of a plurality of blocks of data, whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, then the data is to be accessed by the GPU from the GPU memory without necessarily accessing the processor's cache coherence controllers and wherein requests for the data from the processor cores are processed as uncached requests, preventing the data from being cached in the one or more cache levels of the processor.
-
公开(公告)号:US10255109B2
公开(公告)日:2019-04-09
申请号:US15489062
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Altug Koker , Abhishek R. Appu , Kiran C. Veernapu , Joydeep Ray , Balaji Vembu
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive a completion acknowledgment from the plurality of graphics processing units and in response to a determination that the workload is finished, to terminate one or more communication connections on the interconnect bridge. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10210655B2
公开(公告)日:2019-02-19
申请号:US14865933
申请日:2015-09-25
Applicant: Intel Corporation
Inventor: Murali Ramadoss , Balaji Vembu , Hema C. Nalluri , Michael Apodaca , Jeffery S. Boles
Abstract: By scheduling/managing workload submission to a position only shading pipe one can exploit parallelism with minimum impact to the software scheduler in some embodiments. An interface submits workloads to a slave engine running in one parallel pipe to assist a main engine running in another parallel pipe. Command sequences for each parallel pipe are separated to enable the slave engine to run ahead of the main engine. The slave engine is a position only shader and the main engine is a render engine.
-
公开(公告)号:US20180308209A1
公开(公告)日:2018-10-25
申请号:US16010692
申请日:2018-06-18
Applicant: Intel Corporation
Inventor: Murali Ramadoss , Balaji Vembu , Eric C. Samson , Kun Tian , David J. Cowperthwaite , Altug Koker , Zhi Wang , Joydeep Ray , Subramaniam M. Maiyuran , Abhishek R. Appu
Abstract: Embodiments described herein provide techniques enable a compute unit to continue processing operations when all dispatched threads are blocked. One embodiment provides for an apparatus comprising a thread dispatcher to dispatch a thread for execution; a compute unit having a single instruction, multiple thread architecture, the compute unit to execute multiple concurrent threads; and a memory coupled with the compute unit, the memory to store thread state for a suspended thread, wherein the compute unit is to: detect that all threads on the compute unit are blocked from execution, select a victim thread from the multiple concurrent threads, suspend the victim thread, store thread state of the victim thread to the memory, and replace the victim thread with an additional thread to be executed.
-
公开(公告)号:US20180308208A1
公开(公告)日:2018-10-25
申请号:US15819093
申请日:2017-11-21
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F8/41 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06F17/16 , G06F2009/45583 , G06T1/60
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type
-
公开(公告)号:US20180308205A1
公开(公告)日:2018-10-25
申请号:US15495956
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Balaji Vembu , Nikos Kaburlasos , Josh B. Mastronarde
CPC classification number: G06T1/20 , G06F1/3203 , G06F1/3231 , G06F1/3234 , G06F1/3265 , G06T1/60 , G09G2330/021 , G09G2340/0435 , G09G2354/00
Abstract: In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an input from one or more detectors proximate a display to present an output from a graphics pipeline, determine that a user is not interacting with the display, and in response to a determination that the user is not interacting with the display, to reduce a frame rendering rate of the graphics pipeline. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180307971A1
公开(公告)日:2018-10-25
申请号:US15495020
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
CPC classification number: G06N3/063 , G06F1/3287 , G06F1/3293 , G06F9/30014 , G06F9/30036 , G06F15/78 , G06N3/04 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/005
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180307950A1
公开(公告)日:2018-10-25
申请号:US15494710
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
CPC classification number: G06K9/66 , G06F9/3001 , G06F9/3851 , G06F9/3887 , G06F9/3893 , G06F2207/4824 , G06K9/00973 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084 , G06T1/20
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value associated with a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register, wherein to execute the decoded instruction, the barrel shifter is to shift the input value by the quantized weight value to generate a shifted input value and the adder is to add the shifted input value to a value stored in the accumulator register and update the value stored in the accumulator register.
-
公开(公告)号:US20180307529A1
公开(公告)日:2018-10-25
申请号:US15493670
申请日:2017-04-21
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Balaji Vembu , James A. Valerio , Abhishek R. Appu
CPC classification number: G06F9/4881 , G06T1/60 , G06T2210/52
Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.
-
-
-
-
-
-
-
-
-