Patent search ap:("INTEL CORPORATION") AND inv:"Michael Espig" Page 1

1.

发明授权
Instruction and logic for sum of square differences 有权

公开(公告)号：US12099838B2

公开(公告)日：2024-09-24

申请号：US17132464

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Deepti Aggarwal , Michael Espig , Chekib Nouira , Robert Valentine , Mark Charney

IPC: G06F17/18 , G06F9/30 , G06F9/38 , G06F17/16

CPC classification number: G06F9/3001 , G06F9/3802 , G06F9/3818 , G06F17/16 , G06F17/18

Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of squared differences (SSD) instruction; a decode circuit to decode the SSD instruction; and an execution circuit to, during an execution of the decoded SSD instruction, generate an SSD output vector based on a plurality of input vectors, the SSD output vector including a plurality of squared differences values. Other embodiments are described and claimed.

2.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US11847185B2

公开(公告)日：2023-12-19

申请号：US17485055

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/38 , G06F9/30

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/30101 , G06F9/3802

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

3.

发明授权
Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits 有权

公开(公告)号：US11366636B2

公开(公告)日：2022-06-21

申请号：US16919022

申请日：2020-07-01

Applicant: INTEL CORPORATION

Inventor： Aditya Varma , Michael Espig

IPC: G06F7/544 , G06F9/30 , G06F7/483 , G06N3/04 , G06N3/063

Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.

4.

发明授权
Systems and methods for performing matrix compress and decompress instructions 有权

公开(公告)号：US12175246B2

公开(公告)日：2024-12-24

申请号：US18460497

申请日：2023-09-01

Applicant: Intel Corporation

Inventor： Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

5.

发明申请
METHOD AND APPARATUS FOR EFFICIENT BINARY AND TERNARY SUPPORT IN FUSED MULTIPLY-ADD (FMA) CIRCUITS 有权

公开(公告)号：US20220342641A1

公开(公告)日：2022-10-27

申请号：US17839905

申请日：2022-06-14

Applicant: INTEL CORPORATION

Inventor： Aditya Varma , Michael Espig

IPC: G06F7/544 , G06F9/30 , G06F7/483

Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.

6.

发明授权
Method and apparatus for approximation using polynomials 有权

公开(公告)号：US11327754B2

公开(公告)日：2022-05-10

申请号：US16366941

申请日：2019-03-27

Applicant: Intel Corporation

Inventor： Jorge Parra , Dan Baum , Robert S. Chappell , Michael Espig , Varghese George , Alexander Heinecke , Christopher Hughes , Subramaniam Maiyuran , Prasoonkumar Surti , Ronen Zohar , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30 , G06F17/11 , G06F7/544 , G06F9/38 , G06F7/552

Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.

7.

发明授权
Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions 有权

公开(公告)号：US10942985B2

公开(公告)日：2021-03-09

申请号：US16236464

申请日：2018-12-29

Applicant: Intel Corporation

Inventor： Michael Espig , Christopher J. Hughes , Jongsoo Park

IPC: G06F17/14 , G06F17/16 , G06F7/544 , G06F9/30 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to performing fast Fourier transform (FFT) configuration and computation operations are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of processing element circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to cause the two-dimensional grid of processing element circuits to operate on a first packed data input value and a first complex twiddle factor value to produce a first result and a second result.

8.

发明授权
Systems for performing instructions for fast element unpacking into 2-dimensional registers 有权

公开(公告)号：US10896043B2

公开(公告)日：2021-01-19

申请号：US16146854

申请日：2018-09-28

Applicant: Intel Corporation

Inventor： Bret Toll , Alexander F. Heinecke , Christopher J. Hughes , Ronen Zohar , Michael Espig , Dan Baum , Raanan Sade , Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall

IPC: G06F17/16 , G06F12/02 , G06F9/30 , G06F12/06 , G06F9/38 , G06T1/20 , G06F12/0897 , G06F12/0875 , G06F9/345

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

9.

发明授权
Systems and methods for performing matrix compress and decompress instructions 有权

公开(公告)号：US10719323B2

公开(公告)日：2020-07-21

申请号：US16144902

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

10.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US12287843B2

公开(公告)日：2025-04-29

申请号：US18502291

申请日：2023-11-06

Applicant: Intel Corporation

Inventor： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification