-
公开(公告)号:US12182570B2
公开(公告)日:2024-12-31
申请号:US17359354
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Deepti Aggarwal , Michael Espig , Robert Valentine , Sumit Mohan , Prakaram Joshi , Richard Winterton
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described. In one embodiment, a hardware processor includes a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction having fields that identify a first packed data source, a second packed data source, a packed data destination, a sliding window width, and a stride, and an opcode that indicates an execution circuit is to generate a first chunk of contiguous elements of the first packed data source having a width of the sliding window width, generate a second chunk of contiguous elements of the first packed data source having the width of the sliding window width and shifted by the stride, multiply each element of the first chunk by a corresponding element of a respective chunk of the second packed data source to generate a first set of products, add the first set of products together to generate a first sum, multiply each element of the second chunk by a corresponding element of a respective chunk of the second packed data source to generate a second set of products, add the second set of products together to generate a second sum, and store the first sum in a first element of the packed data destination and the second sum in a second element of the packed data destination; and the execution circuit is to execute the decoded single instruction according to the opcode.
-
公开(公告)号:US20220308881A1
公开(公告)日:2022-09-29
申请号:US17214291
申请日:2021-03-26
Applicant: Intel Corporation
Inventor: Deepti Aggarwal , Michael Espig , Robert Valentine , Mark Charney
Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of absolute differences (SAD) instruction; a decode circuit to decode the SAD instruction; and an execution circuit to, during an execution of the decoded SAD instruction, generate an SAD output vector based on a plurality of input vectors, the SAD output vector including a plurality of absolute differences values. Other embodiments are described and claimed.
-
13.
公开(公告)号:US10922077B2
公开(公告)日:2021-02-16
申请号:US16236463
申请日:2018-12-29
Applicant: Intel Corporation
Inventor: Michael Espig , Christopher J. Hughes
Abstract: Systems, methods, and apparatuses relating to performing stencil configuration and computation operations are described. In one embodiment, a matrix operations accelerator circuit includes a two-dimensional grid of fused multiply accumulate circuits coupled by a network; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to: switch the matrix operations accelerator circuit from a first mode to a second mode where a first set of input values from the first plurality of registers is sent to a first plurality of fused multiply accumulate circuits that form a first row of the two-dimensional grid, a second set of input values from the first plurality of registers is sent to a second plurality of fused multiply accumulate circuits that form a second row of the two-dimensional grid, a first coefficient value from the second plurality of registers is broadcast to a third plurality of fused multiply accumulate circuits that form a first column of the two-dimensional grid, and a second coefficient value from the second plurality of registers is broadcast to a fourth plurality of fused multiply accumulate circuits that form a second column of the two-dimensional grid.
-
14.
公开(公告)号:US10713012B2
公开(公告)日:2020-07-14
申请号:US16160853
申请日:2018-10-15
Applicant: Intel Corporation
Inventor: Aditya Varma , Michael Espig
Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying a multiply-accumulate or multiply-add operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range; control circuitry, responsive to a precision of the first and second operands being below a threshold, to cause the first operand and second operand to be processed by the second multiplication circuitry to generate the result; and adder circuitry to add the result to an accumulated value to generate a new accumulated value.
-
15.
公开(公告)号:US11836464B2
公开(公告)日:2023-12-05
申请号:US17839905
申请日:2022-06-14
Applicant: INTEL CORPORATION
Inventor: Aditya Varma , Michael Espig
CPC classification number: G06F7/5443 , G06F7/483 , G06F7/533 , G06F9/3001 , G06F9/30014 , G06F9/3016 , G06F9/30112 , G06F2207/3812 , G06N3/045 , G06N3/063
Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
-
16.
公开(公告)号:US20230185873A1
公开(公告)日:2023-06-15
申请号:US17548344
申请日:2021-12-10
Applicant: Intel Corporation
Inventor: Michael Espig , Deepti Aggarwal
CPC classification number: G06F17/153 , G06F17/16
Abstract: Methods and apparatus relating to separable convolution filter operations on matrix multiplication arrays are described. In an embodiment, logic circuitry generates a first convolution kernel and a second convolution kernel based on a two-dimensional convolution kernel. A matrix processing array comprising a plurality of Fused Multiply-Add (FMA) blocks applies the first convolution kernel to input data during a first pass to generate an intermediate data and the matrix processing array applies the second convolution kernel to the intermediate data to generate output data. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230161941A1
公开(公告)日:2023-05-25
申请号:US17532877
申请日:2021-11-22
Applicant: Intel Corporation
Inventor: Prakaram Joshi , Deepti Aggarwal , Rajesh Poornachandran , Michael Espig
IPC: G06F30/398 , G06F30/392 , G06F9/48
CPC classification number: G06F30/398 , G06F30/392 , G06F9/4881 , G06F2119/02
Abstract: An embodiment of an integrated circuit may comprise a management controller and circuitry communicatively coupled to the management controller, the circuitry to dynamically determine a performance measurement for each of two or more circuit blocks based at least in part on the physical design layout of the two or more circuit blocks, and report a schedule recommendation to an operating system scheduler based at least in part on the determined performance measurements. Other embodiments are disclosed and claimed.
-
公开(公告)号:US20220365751A1
公开(公告)日:2022-11-17
申请号:US17358722
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Aditya Varma , Mahesh Kumashikar , Michael Espig
Abstract: An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.
-
公开(公告)号:US11249761B2
公开(公告)日:2022-02-15
申请号:US16934003
申请日:2020-07-20
Applicant: Intel Corporation
Inventor: Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US11886875B2
公开(公告)日:2024-01-30
申请号:US16232599
申请日:2018-12-26
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Jonathan D. Pearce , Dan Baum , Guei-Yuan Lueh , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30018 , G06F9/30038
Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
-
-
-
-
-
-
-
-
-