-
公开(公告)号:US11948224B2
公开(公告)日:2024-04-02
申请号:US17978573
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F3/14 , G06F7/483 , G06F9/30 , G06F9/38 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06N20/00 , G06T1/60 , G06T15/00
CPC classification number: G06T1/20 , G06F7/483 , G06F9/30014 , G06F9/30185 , G06F9/3863 , G06F9/5044 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06F3/14 , G06T1/60 , G06T15/005
Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
-
公开(公告)号:US20240086356A1
公开(公告)日:2024-03-14
申请号:US18491474
申请日:2023-10-20
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Varghese George , Mike Macpherson , Aravindh Anantaraman , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Nicolas Galoppo von Borries , Ben J. Ashbaugh
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06T15/06
Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.
-
公开(公告)号:US20240086138A1
公开(公告)日:2024-03-14
申请号:US18474361
申请日:2023-09-26
Applicant: Intel Corporation
Inventor: Eric J. Asperheim , Subramaniam Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker
IPC: G06F3/14 , G06F3/01 , G06F3/0484 , G09G5/391
CPC classification number: G06F3/1438 , G06F3/013 , G06F3/0484 , G09G5/391 , G09G5/001 , G09G2340/0435 , G09G2352/00 , G09G2354/00 , G09G2360/08 , G09G2360/121
Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
-
公开(公告)号:US11908542B2
公开(公告)日:2024-02-20
申请号:US16725747
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Charles Augustine , Somnath Paul , Turbo Majumder , Iqbal Rajwani , Andrew Lines , Altug Koker , Lakshminarayanan Striramassarma , Muhammad Khellah
CPC classification number: G11C7/1048 , G11C7/1006 , G11C7/12 , G11C7/22
Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
-
公开(公告)号:US20240013337A1
公开(公告)日:2024-01-11
申请号:US18351898
申请日:2023-07-13
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
Abstract: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
-
公开(公告)号:US20240012767A1
公开(公告)日:2024-01-11
申请号:US18358550
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Michael Macpherson , Aravindh V. Anantaraman , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Varghese George , Abhishek Appu , Prasoonkumar Surti
CPC classification number: G06T15/005 , G06F9/3013 , G06F9/38873
Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
-
公开(公告)号:US20240005136A1
公开(公告)日:2024-01-04
申请号:US18351124
申请日:2023-07-12
Applicant: Intel Corporation
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045
CPC classification number: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30014 , G06T15/005 , G06F15/78 , G06F15/76 , G06F9/30036 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230418355A1
公开(公告)日:2023-12-28
申请号:US18339827
申请日:2023-06-22
Applicant: INTEL CORPORATION
Inventor: Altug Koker , Abhishek R. Appu , Kiran C. Veernapu , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Eric J. Hoekstra , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy
IPC: G06F1/3209 , H04W52/02 , G06F1/324 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F3/01 , G06F11/07 , G06F11/30
CPC classification number: G06F1/3209 , H04W52/0258 , G06F1/324 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F3/01 , G06F11/0781 , G06F11/3062 , Y02D10/00 , Y02D30/70 , H04M1/72448
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230368516A1
公开(公告)日:2023-11-16
申请号:US18358067
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Barnan Das , Mayuresh M. Varerkar , Narayan Biswal , Stanley J. Baran , Gokcen Cilingir , Nilesh V. Shah , Archie Sharma , Sherine Abdelhak , Praneetha Kotha , Neelay Pandit , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Abhishek R. Appu , Altug Koker , Joydeep Ray
IPC: G06V10/82 , G06F16/783 , G06F16/583 , G06V10/94 , G06V40/10 , G06V40/20 , G06F18/2413 , G06V10/764
CPC classification number: G06V10/82 , G06F16/784 , G06F16/5838 , G06V10/955 , G06V40/10 , G06V40/23 , G06V40/103 , G06F18/24143 , G06V10/764
Abstract: A graphics processor can include a processing cluster array including a plurality of processing clusters coupled with the plurality of memory controllers, each processing cluster of the plurality of processing clusters including a plurality of streaming multiprocessors, the processing cluster array configured for partitioning into a plurality of partitions. The plurality of partitions include a first partition including a first plurality of streaming multiprocessors configured to perform operations for a first neural network, The operations for the first neural network are isolated to the first partition. The plurality of partitions also include a second partition including a second plurality of streaming multiprocessors configured to perform operations for a second neural network. The operations for the second neural network are isolated to the second partition and protected from operations performed for the first neural network.
-
公开(公告)号:US20230351543A1
公开(公告)日:2023-11-02
申请号:US18310688
申请日:2023-05-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Valentin Andrei , Ashutosh Garg , Yoav Harel , Arthur Hunter, JR. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli
IPC: G06N3/084 , G06F15/80 , G06F17/16 , G06N3/048 , G06T1/20 , G06F9/50 , G06F12/0806 , G06F7/544 , G06N3/08
CPC classification number: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
-
-
-
-
-
-
-
-
-