专利检索 ap:("Intel Corporation") AND inv:"Andrey Ayupov" 第 1 页

1.

发明申请
APPARATUS AND METHOD FOR ADAPTABLE AND EFFICIENT LANE-WISE TENSOR PROCESSING 审中-公开

公开(公告)号：US20200104126A1

公开(公告)日：2020-04-02

申请号：US16147696

申请日：2018-09-29

申请人： Intel Corporation

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC分类号： G06F9/30 , G06F9/38 , G06F17/16 , G06F12/0831 , G06F12/084 , G06F7/57

摘要： An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.

2.

发明申请
System, Apparatus And Method For Program Order Queue (POQ) To Manage Data Dependencies In Processor Having Multiple Instruction Queues 审中-公开

公开(公告)号：US20200310815A1

公开(公告)日：2020-10-01

申请号：US16364688

申请日：2019-03-26

申请人： Intel Corporation

发明人： Andrey Ayupov , Srikanth T. Srinivasan , Jonathan D. Pearce , David B. Sheffield

IPC分类号： G06F9/38 , G06F9/30

摘要： In one embodiment, an apparatus includes: a plurality of registers; a first instruction queue to store first instructions; a second instruction queue to store second instructions; a program order queue having a plurality of portions each associated with one of the plurality of registers, each of the portions having entries to store a state of an instruction, the state comprising an encoding of a use of the register by the instruction and a source instruction queue for the instruction; and a dispatcher to dispatch for execution the first and second instructions from the first and second instruction queues based at least in part on information stored in the program order queue, to manage instruction dependencies between the first instructions and the second instructions. Other embodiments are described and claimed.

3.

发明授权
Architecture and method for data parallel single program multiple data (SPMD) execution 有权

公开(公告)号：US10831505B2

公开(公告)日：2020-11-10

申请号：US16147692

申请日：2018-09-29

申请人： Intel Corporation

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov

IPC分类号： G06F9/38 , G06F9/30

摘要： An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

4.

发明授权
Apparatus and method for adaptable and efficient lane-wise tensor processing 有权

公开(公告)号：US11379229B2

公开(公告)日：2022-07-05

申请号：US16987838

申请日：2020-08-07

申请人： INTEL CORPORATION

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Debbie Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond A. Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC分类号： G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084

摘要： An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.

5.

发明授权
System, apparatus and method for program order queue (POQ) to manage data dependencies in processor having multiple instruction queues 有权

公开(公告)号：US11243775B2

公开(公告)日：2022-02-08

申请号：US16364688

申请日：2019-03-26

申请人： Intel Corporation

发明人： Andrey Ayupov , Srikanth T. Srinivasan , Jonathan D. Pearce , David B. Sheffield

IPC分类号： G06F9/38

摘要： In one embodiment, an apparatus includes: a plurality of registers; a first instruction queue to store first instructions; a second instruction queue to store second instructions; a program order queue having a plurality of portions each associated with one of the plurality of registers, each of the portions having entries to store a state of an instruction, the state comprising an encoding of a use of the register by the instruction and a source instruction queue for the instruction; and a dispatcher to dispatch for execution the first and second instructions from the first and second instruction queues based at least in part on information stored in the program order queue, to manage instruction dependencies between the first instructions and the second instructions. Other embodiments are described and claimed.

6.

发明授权
Apparatus and method for gang invariant operation optimizations using dynamic evaluation 有权

公开(公告)号：US11093250B2

公开(公告)日：2021-08-17

申请号：US16147694

申请日：2018-09-29

申请人： Intel Corporation

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jaewoong Sim , Andrey Ayupov

IPC分类号： G06F9/38 , G06F9/30 , G06F9/48 , G06F9/50

摘要： An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more conditionally invariant operations will be invariant across threads of a parallel execution group and/or across the parallel execution lanes; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across threads of a parallel execution group and/or across the parallel execution lanes.

7.

发明授权
Apparatus and method for adaptable and efficient lane-wise tensor processing 有权

公开(公告)号：US10776110B2

公开(公告)日：2020-09-15

申请号：US16147696

申请日：2018-09-29

申请人： Intel Corporation

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC分类号： G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084

摘要： An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.

8.

发明申请
ARCHITECTURE AND METHOD FOR DATA PARALLEL SINGLE PROGRAM MULTIPLE DATA (SPMD) EXECUTION 审中-公开

公开(公告)号：US20200104139A1

公开(公告)日：2020-04-02

申请号：US16147692

申请日：2018-09-29

申请人： Intel Corporation

发明人： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov

IPC分类号： G06F9/38 , G06F9/30

摘要： An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类