-
公开(公告)号:US10909652B2
公开(公告)日:2021-02-02
申请号:US16355303
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
-
公开(公告)号:US10803548B2
公开(公告)日:2020-10-13
申请号:US16355377
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US10769751B2
公开(公告)日:2020-09-08
申请号:US16543849
申请日:2019-08-19
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge F. Garcia Pabon , Vikranth Vemulapalli , Chandra S. Gurram , Aditya Navale , Saurabh Sharma
Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.
-
公开(公告)号:US12293431B2
公开(公告)日:2025-05-06
申请号:US18310688
申请日:2023-05-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Valentin Andrei , Ashutosh Garg , Yoav Harel , Arthur Hunter, Jr. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli
IPC: G06T1/20 , G06F7/544 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
-
公开(公告)号:US20250103548A1
公开(公告)日:2025-03-27
申请号:US18948174
申请日:2024-11-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter, JR. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US12210905B2
公开(公告)日:2025-01-28
申请号:US17358650
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Chandra Gurram , Wei-Yu Chen , Vikranth Vemulapalli , Subramaniam Maiyuran , Jorge Eduardo Parra Osorio , Shuai Mu , Guei-Yuan Lueh , Supratim Pal
Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
-
公开(公告)号:US20240160478A1
公开(公告)日:2024-05-16
申请号:US17987185
申请日:2022-11-15
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Chunhui Mei , Ben J. Ashbaugh , Naveen Matam , Joydeep Ray , Timothy Bauer , Guei-Yuan Lueh , Vasanth Ranganathan , Prashant Chaudhari , Vikranth Vemulapalli , Nishanth Reddy Pendluru , Piotr Reiter , Jain Philip , Marek Rudniewski , Christopher Spencer , Parth Damani , Prathamesh Raghunath Shinde , John Wiegert , Fataneh Ghodrat
IPC: G06F9/50 , G06F12/0875
CPC classification number: G06F9/5016 , G06F12/0875 , G06F2212/452
Abstract: An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.
-
公开(公告)号:US20240028404A1
公开(公告)日:2024-01-25
申请号:US18328591
申请日:2023-06-02
Applicant: Intel Corporation
Inventor: Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre
CPC classification number: G06F9/5027 , G06F9/3851 , G06F9/3877 , G06F9/5033 , G06F9/545 , G06F12/0837 , G06F9/4881 , G06F9/5066 , G06F9/3885 , G06F9/3455 , G06F9/3887 , G06T1/60
Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
-
公开(公告)号:US20240013338A1
公开(公告)日:2024-01-11
申请号:US18470652
申请日:2023-09-20
Applicant: Intel Corporation
Inventor: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
CPC classification number: G06T1/20 , G06F13/4027
Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US11762804B2
公开(公告)日:2023-09-19
申请号:US17868448
申请日:2022-07-19
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike MacPherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-