-
公开(公告)号:US20200294181A1
公开(公告)日:2020-09-17
申请号:US16355377
申请日:2019-03-15
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US12124383B2
公开(公告)日:2024-10-22
申请号:US17862739
申请日:2022-07-12
申请人: Intel Corporation
发明人: Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian
IPC分类号: G06F12/00 , G06F12/0875 , G06F12/0891 , G06F12/123 , G06T1/60
CPC分类号: G06F12/123 , G06F12/0875 , G06F12/0891 , G06T1/60 , G06F2212/302
摘要: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.
-
公开(公告)号:US20240280987A1
公开(公告)日:2024-08-22
申请号:US18595649
申请日:2024-03-05
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
IPC分类号: G05D1/00 , G06F9/46 , G06F9/48 , G06F9/52 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/20
CPC分类号: G05D1/0088 , G06F9/4881 , G06F9/522 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F9/46 , G06T1/20
摘要: A mechanism is described for facilitating barriers and synchronization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
-
14.
公开(公告)号:US20240184572A1
公开(公告)日:2024-06-06
申请号:US18528340
申请日:2023-12-04
申请人: Intel Corporation
发明人: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC分类号: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393
CPC分类号: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/30025 , G06F9/3013 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
摘要: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20240013337A1
公开(公告)日:2024-01-11
申请号:US18351898
申请日:2023-07-13
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
摘要: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
-
公开(公告)号:US20240005136A1
公开(公告)日:2024-01-04
申请号:US18351124
申请日:2023-07-12
申请人: Intel Corporation
发明人: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC分类号: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045
CPC分类号: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30014 , G06T15/005 , G06F15/78 , G06F15/76 , G06F9/30036 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60
摘要: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
17.
公开(公告)号:US11360767B2
公开(公告)日:2022-06-14
申请号:US17305355
申请日:2021-07-06
申请人: Intel Corporation
发明人: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC分类号: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G06N20/00 , G06F17/16
摘要: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US20220180468A1
公开(公告)日:2022-06-09
申请号:US17674781
申请日:2022-02-17
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US11353868B2
公开(公告)日:2022-06-07
申请号:US15495112
申请日:2017-04-24
申请人: Intel Corporation
发明人: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
摘要: One or more examples include an apparatus having a hardware barrier logic to detect thread groups relating to machine learning operations and facilitate barrier synchronization of the thread groups across multiple dies representing multiple processors, such that data processing using the threads groups across the multiple processors is synchronized and stall-free.
-
20.
公开(公告)号:US20220019431A1
公开(公告)日:2022-01-20
申请号:US17305355
申请日:2021-07-06
申请人: Intel Corporation
发明人: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC分类号: G06F9/30 , G06N3/04 , G06F9/38 , G06F7/544 , G06N3/08 , G06N3/063 , G06F7/483 , G09G5/393 , G06T15/00 , G06F17/16 , G06N20/00
摘要: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
-
-
-
-
-
-
-
-