-
公开(公告)号:US20220318009A1
公开(公告)日:2022-10-06
申请号:US17573556
申请日:2022-01-11
Applicant: Intel Corporation
Inventor: Venkateswara R. MADDURI , Carl MURRAY , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Robert VALENTINE , Jesus CORBAL
Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
-
公开(公告)号:US20220291927A1
公开(公告)日:2022-09-15
申请号:US17706428
申请日:2022-03-28
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Rinat RAPPOPORT , Jesus CORBAL , Stanislav SHWARTSMAN , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Dan BAUM , Yuri GEBIL
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
公开(公告)号:US20220207107A1
公开(公告)日:2022-06-30
申请号:US17133473
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Daniel TOWNER , Amit GRADSTEIN , Mark Jay CHARNEY
IPC: G06F17/16
Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix. The decoder may also decode and the execution circuitry may execute a second complex matrix multiplication instruction to multiply real and imaginary values from the first plurality with corresponding imaginary and real values, respectively, from the second plurality to generate first and second pluralities of imaginary products, and to add corresponding imaginary products to produce a corresponding imaginary value in the result matrix.
-
公开(公告)号:US20220197654A1
公开(公告)日:2022-06-23
申请号:US17133400
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Daniel TOWNER , Amit GRADSTEIN , Mark Jay CHARNEY
Abstract: An apparatus and method for complex matrix conjugation. For example, one embodiment of a processor comprises: a decoder to decode a complex conjugate transpose instruction including a source operand to identify a complex source matrix and a destination operand to identify a complex result matrix, the complex source matrix to store a first plurality of complex values and the complex result matrix to store a second plurality of complex values, each complex value in the first and second plurality of complex values including a real component and an imaginary component; a plurality of registers or local memory to store all or a subset of the first plurality of complex values; and execution circuitry to execute the complex conjugate transpose instruction using matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values, and transpose hardware logic to perform a matrix transpose operation using the plurality of complex conjugate values to generate a result matrix.
-
公开(公告)号:US20220171627A1
公开(公告)日:2022-06-02
申请号:US17672253
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20220164218A1
公开(公告)日:2022-05-26
申请号:US17381521
申请日:2021-07-21
Applicant: Intel Corporation
Inventor: Rajesh M. SANKARAN , Gilbert NEIGER , Narayan RANGANATHAN , Stephen R. VAN DOREN , Joseph NUZMAN , Niall D. MCDONNELL , Michael A. O'HANLON , Lokpraveen B. MOSUR , Tracy Garrett DRYSDALE , Eriko NURVITADHI , Asit K. MISHRA , Ganesh VENKATESH , Deborah T. MARR , Nicholas P. CARTER , Jonathan D. PEARCE , Edward T. GROCHOWSKI , Richard J. GRECO , Robert VALENTINE , Jesus CORBAL , Thomas D. FLETCHER , Dennis R. BRADFORD , Dwight P. MANLEY , Mark J. CHARNEY , Jeffrey J. COOK , Paul CAPRIOLI , Koichi YAMADA , Kent D. GLOSSOP , David B. SHEFFIELD
Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US20220129268A1
公开(公告)日:2022-04-28
申请号:US17518336
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 16 most significant bits of the first and second right-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations of a destination register.
-
78.
公开(公告)号:US20220129267A1
公开(公告)日:2022-04-28
申请号:US17518291
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one processor embodiment comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 32 most significant bits of the first and second right-shifted quadwords to be written to 32 least significant bit positions of first and second quadword data element locations of a destination register.
-
公开(公告)号:US20220058021A1
公开(公告)日:2022-02-24
申请号:US17516023
申请日:2021-11-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
80.
公开(公告)号:US20210326131A1
公开(公告)日:2021-10-21
申请号:US17362854
申请日:2021-06-29
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
-
-
-
-
-
-
-
-