-
公开(公告)号:US20180173563A1
公开(公告)日:2018-06-21
申请号:US15383738
申请日:2016-12-19
Applicant: Intel Corporation
Inventor: Chunling Hu , Tatiana Shpeisman , Rajkishore Barik , Justin E. Gottschlich
CPC classification number: G06F9/4881 , G06F9/5027
Abstract: A dynamic runtime scheduling system includes task manager circuitry capable of detecting a correspondence in at least a portion of the output arguments from one or more first tasks with at least a portion of the input arguments to one or more second tasks. Upon detecting the output arguments from the first task represents a superset of the second task input arguments, the task manager circuitry apportions the first task into a plurality of new subtasks. At least one of the new subtasks includes output arguments having a 1:1 correspondence to the second task input arguments. Upon detecting the output arguments from an first task represents a subset of the second task input arguments, the task manager circuitry may autonomously apportion the second task into a plurality of new subtasks. At least one of the new subtasks may include input arguments having a 1:1 correspondence to first task output arguments.
-
公开(公告)号:US09606919B2
公开(公告)日:2017-03-28
申请号:US14513065
申请日:2014-10-13
Applicant: Intel Corporation
Inventor: Yang Ni , Rajkishore Barik , Ali-Reza Adl-Tabatabai , Tatiana Shpeisman , Jayanth N. Rao , Ben J. Ashbaugh , Tomasz Janczak
IPC: G06F12/08 , G06F12/0806 , G06F15/167 , G06T1/60
CPC classification number: G06F12/0806 , G06F15/167 , G06T1/60
Abstract: A method and apparatus to facilitate shared pointers in a heterogeneous platform. In one embodiment of the invention, the heterogeneous or non-homogeneous platform includes, but is not limited to, a central processing core or unit, a graphics processing core or unit, a digital signal processor, an interface module, and any other form of processing cores. The heterogeneous platform has logic to facilitate sharing of pointers to a location of a memory shared by the CPU and the GPU. By sharing pointers in the heterogeneous platform, the data or information sharing between different cores in the heterogeneous platform can be simplified.
-
公开(公告)号:US20250061534A1
公开(公告)日:2025-02-20
申请号:US18819073
申请日:2024-08-29
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084
Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
-
64.
公开(公告)号:US12217053B2
公开(公告)日:2025-02-04
申请号:US18528340
申请日:2023-12-04
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F17/16 , G06N20/00 , G06T15/00
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US12147849B2
公开(公告)日:2024-11-19
申请号:US17493419
申请日:2021-10-04
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Stephan A. Herhut , Jaswanth Sreeram , Tatiana Shpeisman , Richard L. Hudson
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to steal work in heterogeneous computing systems. An apparatus includes load balancing circuitry to obtain tasks from a workload by encoding minimum and maximum index ranges of a data parallel operation, allocate a first task from the workload to a first work queue based on a first capability of first computation circuitry, the first computation circuitry to process the first task in the first work queue, and allocate a second task from the workload to a second work queue, second computation circuitry to process the second task in the second work queue. The apparatus further includes first work stealer logic to steal the second task from the second work queue using an atomic operation to access the second work queue.
-
公开(公告)号:US20240354559A1
公开(公告)日:2024-10-24
申请号:US18646021
申请日:2024-04-25
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Brian T. Lewis , Murali Sundaresan , Jeffrey Jackson , Feng Chen , Xiaoming Chen , Mike Macpherson
Abstract: A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.
-
67.
公开(公告)号:US20230394616A1
公开(公告)日:2023-12-07
申请号:US18334733
申请日:2023-06-14
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
IPC: G06T1/20 , G06N3/063 , G06F9/38 , G06F9/30 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/08
CPC classification number: G06T1/20 , G06N3/063 , G06F9/3887 , G06F9/3895 , G06F9/3001 , G06F9/3851 , G06F9/3017 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/08
Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
-
公开(公告)号:US11748606B2
公开(公告)日:2023-09-05
申请号:US17317857
申请日:2021-05-11
Applicant: INTEL CORPORATION
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
IPC: G06F7/50 , G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60
CPC classification number: G06N3/063 , G06F1/3287 , G06F1/3293 , G06F9/30014 , G06F9/30036 , G06F15/76 , G06F15/78 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/084 , G06T1/20 , G06T15/005 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230040631A1
公开(公告)日:2023-02-09
申请号:US17881720
申请日:2022-08-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06K9/62 , G06N20/00 , G06F12/02 , G06F9/48 , G06F17/16 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
70.
公开(公告)号:US20220357945A1
公开(公告)日:2022-11-10
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
-
-
-
-
-
-
-
-