-
公开(公告)号:US20220083339A1
公开(公告)日:2022-03-17
申请号:US17509726
申请日:2021-10-25
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
-
公开(公告)号:US20210349848A1
公开(公告)日:2021-11-11
申请号:US17321885
申请日:2021-05-17
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALOPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210191868A1
公开(公告)日:2021-06-24
申请号:US16724813
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: JOYDEEP RAY , VASANTH RANGANATHAN , BEN ASHBAUGH , JAMES VALERIO
IPC: G06F12/0846 , G06F12/0837 , G06F12/084 , G06F9/50 , G06F9/38 , G06F9/30
Abstract: An apparatus to facilitate partitioning of local memory is disclosed. The apparatus includes a plurality of execution units to execute a plurality of execution threads, a memory coupled to share access between the plurality of execution units and partitioning hardware to partition the memory to be used as a cache and as shared local memory (SLM), wherein the partitioning hardware partitions the memory based on a quantity of the plurality of execution threads executing on the execution units that are active.
-
公开(公告)号:US20190041961A1
公开(公告)日:2019-02-07
申请号:US16144538
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: KINCHIT DESAI , SANJEEV JAHAGIRDAR , PRASOONKUMAR SURTI , JOYDEEP RAY
Abstract: Embodiments are generally directed to providing power savings for a neural network architecture with zero activations during inference. An embodiment of an apparatus includes one or more processors including one or more processor cores; and a memory to store data for processing including neural network processing, wherein the apparatus to perform a fast clear operation to initialize activation buffers for a neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, wherein the apparatus is to compare outputs for the neural network to the metadata values and to write an output to memory only if the output is non-zero.
-
公开(公告)号:US20180293183A1
公开(公告)日:2018-10-11
申请号:US15482690
申请日:2017-04-07
Applicant: Intel Corporation
Inventor: NIRANJAN L. COORAY , ABHISHEK R. APPU , ALTUG KOKER , JOYDEEP RAY , BALAJI VEMBU , PATTABHIRAMAN K , DAVID PUFFER , DAVID J. COWPERTHWAITE , RAJESH M. SANKARAN , SATYESHWAR SINGH , SAMEER KP , ANKUR N. SHAH , KUN TIAN
IPC: G06F13/16 , G06F13/40 , G06F12/1027 , G06F12/0802
CPC classification number: G06F13/16 , G06F12/0802 , G06F12/1009 , G06F12/1027 , G06F12/1036 , G06F13/4068 , G06F2212/1024 , G06F2212/302 , G06F2212/60 , G06F2212/68
Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
-
公开(公告)号:US20250077232A1
公开(公告)日:2025-03-06
申请号:US18882364
申请日:2024-09-11
Applicant: Intel Corporation
Inventor: JAMES VALERIO , VASANTH RANGANATHAN , JOYDEEP RAY , PRADEEP RAMANI
Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
-
公开(公告)号:US20220206795A1
公开(公告)日:2022-06-30
申请号:US17569229
申请日:2022-01-05
Applicant: Intel Corporation
Inventor: SUBRAMANIAM MAIYURAN , VARGHESE GEORGE , JOYDEEP RAY , ASHUTOSH GARG , JORGE PARRA , SHUBH SHAH , SHUBRA MARWAHA
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
-
公开(公告)号:US20210056051A1
公开(公告)日:2021-02-25
申请号:US17008991
申请日:2020-09-01
Applicant: Intel Corporation
Inventor: NIRANJAN L. COORAY , ABHISHEK R. APPU , ALTUG KOKER , JOYDEEP RAY , BALAJI VEMBU , PATTABHIRAMAN K , DAVID PUFFER , DAVID J. COWPERTHWAITE , RAJESH M. SANKARAN , SATYESHWAR SINGH , SAMEER KP , ANKUR N. SHAH , KUN TIAN
IPC: G06F13/16 , G06F12/1009 , G06F12/1027 , G06F12/1036 , G06F12/0802 , G06F13/40
Abstract: An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
-
公开(公告)号:US20200210246A1
公开(公告)日:2020-07-02
申请号:US16696848
申请日:2019-11-26
Applicant: Intel Corporation
Inventor: PRASOONKUMAR SURTI , DAVID COWPERTHWAITE , ABHISHEK R. APPU , JOYDEEP RAY , VASANTH RANGANATHAN , ALTUG KOKER , BALAJI VEMBU
IPC: G06F9/50
Abstract: A mechanism is described for facilitating localized load-balancing for processors in computing devices. A method of embodiments, as described herein, includes facilitating hosting, at a processor of a computing device, a local load-balancing mechanism. The method may further include monitoring balancing of loads at the processor and serving as a local scheduler to maintain de-centralized load-balancing at the processor and between the processor and other one or more processors.
-
公开(公告)号:US20180308195A1
公开(公告)日:2018-10-25
申请号:US15493233
申请日:2017-04-21
Applicant: Intel Corporation
Inventor: BALAJI VEMBU , ALTUG KOKER , JOYDEEP RAY
CPC classification number: G06T1/20 , G06T15/005 , G06T2200/04
Abstract: One embodiment provides for a general-purpose graphics processing unit comprising multiple processing units and a pipeline manager to distribute a thread group to the multiple processing units, wherein the pipeline manager is to distribute the thread group as multiple thread sub-groups.
-
-
-
-
-
-
-
-
-