Patent search ap:("INTEL CORPORATION") AND inv:"Michael Espig" Page 2

11.

发明授权
Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control 有权

公开(公告)号：US12182570B2

公开(公告)日：2024-12-31

申请号：US17359354

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Deepti Aggarwal , Michael Espig , Robert Valentine , Sumit Mohan , Prakaram Joshi , Richard Winterton

IPC: G06F9/30

Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described. In one embodiment, a hardware processor includes a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction having fields that identify a first packed data source, a second packed data source, a packed data destination, a sliding window width, and a stride, and an opcode that indicates an execution circuit is to generate a first chunk of contiguous elements of the first packed data source having a width of the sliding window width, generate a second chunk of contiguous elements of the first packed data source having the width of the sliding window width and shifted by the stride, multiply each element of the first chunk by a corresponding element of a respective chunk of the second packed data source to generate a first set of products, add the first set of products together to generate a first sum, multiply each element of the second chunk by a corresponding element of a respective chunk of the second packed data source to generate a second set of products, add the second set of products together to generate a second sum, and store the first sum in a first element of the packed data destination and the second sum in a second element of the packed data destination; and the execution circuit is to execute the decoded single instruction according to the opcode.

12.

发明申请
INSTRUCTION AND LOGIC FOR SUM OF ABSOLUTE DIFFERENCES 有权

公开(公告)号：US20220308881A1

公开(公告)日：2022-09-29

申请号：US17214291

申请日：2021-03-26

Applicant: Intel Corporation

Inventor： Deepti Aggarwal , Michael Espig , Robert Valentine , Mark Charney

IPC: G06F9/38 , G06F9/30

Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of absolute differences (SAD) instruction; a decode circuit to decode the SAD instruction; and an execution circuit to, during an execution of the decoded SAD instruction, generate an SAD output vector based on a plurality of input vectors, the SAD output vector including a plurality of absolute differences values. Other embodiments are described and claimed.

13.

发明授权
Apparatuses, methods, and systems for stencil configuration and computation instructions 有权

公开(公告)号：US10922077B2

公开(公告)日：2021-02-16

申请号：US16236463

申请日：2018-12-29

Applicant: Intel Corporation

Inventor： Michael Espig , Christopher J. Hughes

IPC: G06F17/16 , G06F9/30

Abstract: Systems, methods, and apparatuses relating to performing stencil configuration and computation operations are described. In one embodiment, a matrix operations accelerator circuit includes a two-dimensional grid of fused multiply accumulate circuits coupled by a network; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to: switch the matrix operations accelerator circuit from a first mode to a second mode where a first set of input values from the first plurality of registers is sent to a first plurality of fused multiply accumulate circuits that form a first row of the two-dimensional grid, a second set of input values from the first plurality of registers is sent to a second plurality of fused multiply accumulate circuits that form a second row of the two-dimensional grid, a first coefficient value from the second plurality of registers is broadcast to a third plurality of fused multiply accumulate circuits that form a first column of the two-dimensional grid, and a second coefficient value from the second plurality of registers is broadcast to a fourth plurality of fused multiply accumulate circuits that form a second column of the two-dimensional grid.

14.

发明授权
Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits 有权

公开(公告)号：US10713012B2

公开(公告)日：2020-07-14

申请号：US16160853

申请日：2018-10-15

Applicant: Intel Corporation

Inventor： Aditya Varma , Michael Espig

IPC: G06F7/544 , G06F9/30 , G06F7/483 , G06N3/04 , G06N3/063

Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying a multiply-accumulate or multiply-add operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range; control circuitry, responsive to a precision of the first and second operands being below a threshold, to cause the first operand and second operand to be processed by the second multiplication circuitry to generate the result; and adder circuitry to add the result to an accumulated value to generate a new accumulated value.

15.

发明授权
Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits 有权

公开(公告)号：US11836464B2

公开(公告)日：2023-12-05

申请号：US17839905

申请日：2022-06-14

Applicant: INTEL CORPORATION

Inventor： Aditya Varma , Michael Espig

IPC: G06F7/544 , G06F7/533 , G06F9/30 , G06F7/483 , G06N3/063 , G06N3/045

CPC classification number: G06F7/5443 , G06F7/483 , G06F7/533 , G06F9/3001 , G06F9/30014 , G06F9/3016 , G06F9/30112 , G06F2207/3812 , G06N3/045 , G06N3/063

Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.

16.

发明公开
METHOD AND APPARATUS FOR SEPARABLE CONVOLUTION FILTER OPERATIONS ON MATRIX MULTIPLICATION ARRAYS 审中-公开

公开(公告)号：US20230185873A1

公开(公告)日：2023-06-15

申请号：US17548344

申请日：2021-12-10

Applicant: Intel Corporation

Inventor： Michael Espig , Deepti Aggarwal

IPC: G06F17/15 , G06F17/16

CPC classification number: G06F17/153 , G06F17/16

Abstract: Methods and apparatus relating to separable convolution filter operations on matrix multiplication arrays are described. In an embodiment, logic circuitry generates a first convolution kernel and a second convolution kernel based on a two-dimensional convolution kernel. A matrix processing array comprising a plurality of Fused Multiply-Add (FMA) blocks applies the first convolution kernel to input data during a first pass to generate an intermediate data and the matrix processing array applies the second convolution kernel to the intermediate data to generate output data. Other embodiments are also disclosed and claimed.

17.

发明公开
APPLICATION NEGOTIABLE PLATFORM THERMAL AWARE SCHEDULER 审中-公开

公开(公告)号：US20230161941A1

公开(公告)日：2023-05-25

申请号：US17532877

申请日：2021-11-22

Applicant: Intel Corporation

Inventor： Prakaram Joshi , Deepti Aggarwal , Rajesh Poornachandran , Michael Espig

IPC: G06F30/398 , G06F30/392 , G06F9/48

CPC classification number: G06F30/398 , G06F30/392 , G06F9/4881 , G06F2119/02

Abstract: An embodiment of an integrated circuit may comprise a management controller and circuitry communicatively coupled to the management controller, the circuitry to dynamically determine a performance measurement for each of two or more circuit blocks based at least in part on the physical design layout of the two or more circuit blocks, and report a schedule recommendation to an operating system scheduler based at least in part on the determined performance measurements. Other embodiments are disclosed and claimed.

18.

发明申请
COMPRESSED WALLACE TREES IN FMA CIRCUITS 有权

公开(公告)号：US20220365751A1

公开(公告)日：2022-11-17

申请号：US17358722

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Aditya Varma , Mahesh Kumashikar , Michael Espig

IPC: G06F7/53 , G06F7/502 , G06F15/80

Abstract: An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.

19.

发明授权
Systems and methods for performing matrix compress and decompress instructions 有权

公开(公告)号：US11249761B2

公开(公告)日：2022-02-15

申请号：US16934003

申请日：2020-07-20

Applicant: Intel Corporation

Inventor： Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

20.

发明授权
Systems and methods for performing nibble-sized operations on matrix elements 有权

公开(公告)号：US11886875B2

公开(公告)日：2024-01-30

申请号：US16232599

申请日：2018-12-26

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Jonathan D. Pearce , Dan Baum , Guei-Yuan Lueh , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30018 , G06F9/30038

Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification