-
公开(公告)号:US11748841B2
公开(公告)日:2023-09-05
申请号:US17871781
申请日:2022-07-22
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
CPC classification number: G06T1/20 , G06F9/46 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/084
Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning. A method of embodiments, as described herein, includes limiting execution of workloads for the respective contexts of a plurality of contexts to a specified subset of a plurality of processing resources of a processing system according to physical resource slices of the processing system that are associated with the respective contexts of the plurality of contexts.
-
公开(公告)号:US11748606B2
公开(公告)日:2023-09-05
申请号:US17317857
申请日:2021-05-11
Applicant: INTEL CORPORATION
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
IPC: G06F7/50 , G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60
CPC classification number: G06N3/063 , G06F1/3287 , G06F1/3293 , G06F9/30014 , G06F9/30036 , G06F15/76 , G06F15/78 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/084 , G06T1/20 , G06T15/005 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11620256B2
公开(公告)日:2023-04-04
申请号:US17732308
申请日:2022-04-28
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter, Jr. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F12/08 , G06F15/78 , G06F9/30 , G06F9/38 , G06F17/18 , G06F12/0802 , G06F7/544 , G06F7/575 , G06F12/02 , G06F12/0866 , G06F12/0875 , G06F12/0895 , G06F12/128 , G06F12/06 , G06F12/1009 , G06T1/20 , G06T1/60 , H03M7/46 , G06F12/0811 , G06F15/80 , G06F17/16 , G06F7/58 , G06F12/0871 , G06F12/0862 , G06F12/0897 , G06F9/50 , G06F12/0804 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06T15/06 , G06N3/08
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US20230040631A1
公开(公告)日:2023-02-09
申请号:US17881720
申请日:2022-08-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06K9/62 , G06N20/00 , G06F12/02 , G06F9/48 , G06F17/16 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
95.
公开(公告)号:US20220357945A1
公开(公告)日:2022-11-10
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
公开(公告)号:US11430082B2
公开(公告)日:2022-08-30
申请号:US17143805
申请日:2021-01-07
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
Abstract: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.
-
公开(公告)号:US11409658B2
公开(公告)日:2022-08-09
申请号:US17161465
申请日:2021-01-28
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G09G5/36 , G06F12/0862 , G06T1/20 , G06T1/60
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US11379235B2
公开(公告)日:2022-07-05
申请号:US17128972
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan
Abstract: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.
-
公开(公告)号:US20220164916A1
公开(公告)日:2022-05-26
申请号:US17541413
申请日:2021-12-03
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
-
公开(公告)号:US20220113783A1
公开(公告)日:2022-04-14
申请号:US17411878
申请日:2021-08-25
Applicant: INTEL CORPORATION
Inventor: Altug Koker , Abhishek R. Appu , Kiran C. Veernapu , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Eric J. Hoekstra , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy
IPC: G06F1/3209 , H04W52/02 , G06F1/324 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F3/01 , G06F11/07 , G06F11/30
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-