-
公开(公告)号:US20240005443A1
公开(公告)日:2024-01-04
申请号:US18455128
申请日:2023-08-24
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
CPC分类号: G06T1/20 , G06F13/4027
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US11861761B2
公开(公告)日:2024-01-02
申请号:US17095590
申请日:2020-11-11
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang
IPC分类号: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04
CPC分类号: G06T1/60 , G06F12/0802 , G06N5/04 , G06T1/20 , G06F2212/251
摘要: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
-
公开(公告)号:US20230260075A1
公开(公告)日:2023-08-17
申请号:US18305904
申请日:2023-04-24
申请人: Intel Corporation
发明人: Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang
IPC分类号: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04
CPC分类号: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04 , G06F2212/251
摘要: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalone workload by a first IP core of the multiple IP cores, the common cache is treated as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low power access path between the first IP core and the common cache that is outside of the central fabric.
-
公开(公告)号:US11493974B2
公开(公告)日:2022-11-08
申请号:US16994073
申请日:2020-08-14
申请人: Intel Corporation
摘要: Dynamic power budget allocation in a multi-processor system is described. In an example, an apparatus includes a plurality of processor units; and a power control component, the power control component to monitor power utilization of each of the plurality of processor units, wherein power consumed by the plurality of processor units is limited by a global power budget. The apparatus is to assign a workload to each of the processor units and is to establish an initial power budget for operation of each of the processor units, and, upon the apparatus determining that one or more processor units require an increased power budget based on one or more criteria, the apparatus is to dynamically reallocate an amount of the global power budget to the one or more processor units.
-
公开(公告)号:US20220253317A1
公开(公告)日:2022-08-11
申请号:US17683564
申请日:2022-03-01
申请人: Intel Corporation
发明人: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
摘要: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US11410266B2
公开(公告)日:2022-08-09
申请号:US17069188
申请日:2020-10-13
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US11386521B2
公开(公告)日:2022-07-12
申请号:US17161941
申请日:2021-01-29
申请人: Intel Corporation
发明人: Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
-
公开(公告)号:US11269643B2
公开(公告)日:2022-03-08
申请号:US15482798
申请日:2017-04-09
申请人: Intel Corporation
发明人: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
摘要: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US20220036500A1
公开(公告)日:2022-02-03
申请号:US17500375
申请日:2021-10-13
申请人: Intel Corporation
发明人: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
摘要: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US20210255857A1
公开(公告)日:2021-08-19
申请号:US17128972
申请日:2020-12-21
申请人: Intel Corporation
发明人: Feng Chen , Narayan Srinivasa , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Joydeep Ray , Nicolas C. Galoppo Von Borries , Prasoonkumar Surti , Ben J. Ashbaugh , Sanjeev Jahagirdar , Vasanth Ranganathan
摘要: A mechanism is described for facilitating intelligent dispatching and vectorizing at autonomous machines. A method of embodiments, as described herein, includes detecting a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a graphics processor. The method may further include determining a first set of threads of the plurality of threads that are similar to each other or have adjacent surfaces, and physically clustering the first set of threads close together using a first set of adjacent compute blocks.
-
-
-
-
-
-
-
-
-