-
公开(公告)号:US11373088B2
公开(公告)日:2022-06-28
申请号:US15859504
申请日:2017-12-30
申请人: Intel Corporation
发明人: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
摘要: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US10831505B2
公开(公告)日:2020-11-10
申请号:US16147692
申请日:2018-09-29
申请人: Intel Corporation
发明人: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov
摘要: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.
-
公开(公告)号:US20230053289A1
公开(公告)日:2023-02-16
申请号:US17845794
申请日:2022-06-21
申请人: Intel Corporation
发明人: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
摘要: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US20200104126A1
公开(公告)日:2020-04-02
申请号:US16147696
申请日:2018-09-29
申请人: Intel Corporation
发明人: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi
IPC分类号: G06F9/30 , G06F9/38 , G06F17/16 , G06F12/0831 , G06F12/084 , G06F7/57
摘要: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.
-
公开(公告)号:US12039435B2
公开(公告)日:2024-07-16
申请号:US17845794
申请日:2022-06-21
申请人: Intel Corporation
发明人: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
摘要: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US11379229B2
公开(公告)日:2022-07-05
申请号:US16987838
申请日:2020-08-07
申请人: INTEL CORPORATION
发明人: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Debbie Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond A. Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi
IPC分类号: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084
摘要: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.
-
公开(公告)号:US10915328B2
公开(公告)日:2021-02-09
申请号:US16220528
申请日:2018-12-14
申请人: Intel Corporation
摘要: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.
-
公开(公告)号:US10776110B2
公开(公告)日:2020-09-15
申请号:US16147696
申请日:2018-09-29
申请人: Intel Corporation
发明人: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi
IPC分类号: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084
摘要: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.
-
9.
公开(公告)号:US20200104139A1
公开(公告)日:2020-04-02
申请号:US16147692
申请日:2018-09-29
申请人: Intel Corporation
发明人: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov
摘要: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.
-
公开(公告)号:US20190205737A1
公开(公告)日:2019-07-04
申请号:US15859504
申请日:2017-12-30
申请人: Intel Corporation
发明人: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
摘要: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
-
-
-
-
-
-
-
-