-
公开(公告)号:US20190026856A1
公开(公告)日:2019-01-24
申请号:US16120591
申请日:2018-09-04
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Joydeep Ray , Altug Koker , Balaji Vembu , Prasoonkumar P. Surti , Kamal Sinha , Vasanth Ranganathan , Kiran C. Veernapu , Bhushan M. Borole , Wenyin Fu
Abstract: In accordance with one embodiment each page table entry maps a variable page size (per entry), if multiple continuous virtual pages map to contiguous physical pages. This may drastically reduce the number of translation lookaside buffer (TLB) entries needed since each entry can potentially map a larger chunk of memory, in some embodiments.
-
公开(公告)号:US10186011B2
公开(公告)日:2019-01-22
申请号:US15581182
申请日:2017-04-28
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
-
公开(公告)号:US10158346B2
公开(公告)日:2018-12-18
申请号:US15488628
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Bhushan M. Borole , Anupama A. Thaploo , Altug Koker , Abhishek R. Appu , Kamal Sinha , Wenyin Fu
Abstract: A pulse triggered flip flop circuit includes an exclusive OR clock generating stage that receives an input clock, data and produces an output clock pulse. The stage produces a output clock pulse that only goes away when the data is fully captured. The stage disables the output clock pulse only when the data is fully captured. Moreover, the circuit only toggles when the input data changes, reducing power consumption in some embodiments.
-
公开(公告)号:US20180314249A1
公开(公告)日:2018-11-01
申请号:US15581124
申请日:2017-04-28
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Altug Koker , Farshad Akhbari , Feng Chen , Dukhwan Kim , Narayan Srinivasa , Nadathur Rajagopalan Satish , Kamal Sinha , Joydeep Ray , Balaji Vembu , Mike B. Macpherson , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
CPC classification number: G06F9/5016 , G06F9/5061
Abstract: A mechanism is described for facilitating storage management for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting one or more components associated with machine learning, where the one or more components include memory and a processor coupled to the memory, and where the processor includes a graphics processor. The method may further include allocating a storage portion of the memory and a hardware portion of the processor to a machine learning training set, where the storage and hardware portions are precise for implementation and processing of the training set.
-
公开(公告)号:US20180308256A1
公开(公告)日:2018-10-25
申请号:US15494812
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Nadathur Rajagopalan Satish , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Farshad Akhbari
CPC classification number: G06T9/00 , G06N3/0445 , G06N3/0454 , G06N3/0481 , G06N3/063 , G06N3/084
Abstract: An apparatus to facilitate compute compression is disclosed. The apparatus includes a graphics processing unit including mapping logic to map a first block of integer pixel data to a compression block and compression logic to compress the compression block.
-
公开(公告)号:US20180300260A1
公开(公告)日:2018-10-18
申请号:US15488840
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Joydeep Ray , James A. Valerio , Altug Koker , Prasoonkumar P. Surti , Balaji Vembu , Wenyin FU , Bhushan M. Borole , Kamal Sinha
IPC: G06F12/128 , G06F12/0811 , G06F13/40 , G06T1/20
Abstract: A hybrid hierarchical cache is implemented at the same level in the access pipeline, to get the faster access behavior of a smaller cache and, at the same time, a higher hit rate at lower power for a larger cache, in some embodiments. A split cache at the same level in the access pipeline includes two caches that work together. In the hybrid, split, low level cache (e.g., L1) evictions are coordinated locally between the two L1 portions, and on a miss to both L1 portions, a line is allocated from a larger L2 cache to the smallest L1 cache.
-
公开(公告)号:US20180300246A1
公开(公告)日:2018-10-18
申请号:US15489149
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker
IPC: G06F12/0837 , G06N3/08 , G06N99/00
CPC classification number: G06F12/0837 , G06F2212/62 , G06N3/08 , G06N99/005 , G06T1/20
Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180293695A1
公开(公告)日:2018-10-11
申请号:US15483236
申请日:2017-04-10
Applicant: Intel Corporation
Inventor: Saurabh Sharma , Abhishek Venkatesh , Travis T. Schluessler , Prasoonkumar Surti , Altug Koker , Aravindh V. Anantaraman , Pattabhiraman P. K. , Abhishek R. Appu , Joydeep Ray , Kamal Sinha , Vasanth Ranganathan , Bhushan M. Borole , Wenyin Fu , Eric J. Hoekstra , Linda L. Hurd
CPC classification number: G06T1/20 , G06T1/60 , G06T15/005
Abstract: A control surface tracks an individual cacheline in the original surface for frequent data values. If so, control surface bits are set. When reading a cacheline from memory, first the control surface bits are read. If they happen to be set, then the original memory read is skipped altogether and instead the bits from the control surface provide the value for the entire cacheline.
-
公开(公告)号:US12198221B2
公开(公告)日:2025-01-14
申请号:US18436494
申请日:2024-02-08
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
公开(公告)号:US20240256456A1
公开(公告)日:2024-08-01
申请号:US18391346
申请日:2023-12-20
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
-
-
-
-
-
-
-
-