SYSTEMS, METHODS, AND APPARATUSES FOR DOT PRODUCTION OPERATIONS

    公开(公告)号:EP4303724A1

    公开(公告)日:2024-01-10

    申请号:EP23194771.4

    申请日:2017-07-01

    Abstract: Embodiments detailed herein relate to matrix operations. For example, a processor comprises decode circuitry to decode a single matrix instruction and execution circuitry to execute the single matrix instruction. The single matrix instruction has fields for an opcode, a plurality of identifiers corresponding to a first plurality of 4-bit sized data elements of a first source matrix, a second plurality of 4-bit sized data elements of a second source matrix, a plurality of doubleword- sized source data elements of a third source matrix, and a plurality of doubleword-sized result data elements of a result matrix, and bits indicating whether one or both of the first and second plurality of 4-bit sized data elements are signed or unsigned. The execution circuitry includes a multiply accumulate circuit, comprising: a multiplier to multiply each 4-bit sized data element of a first subset of the first plurality of 4-bit sized data elements with a corresponding 4-bit sized data element of a first subset of the second plurality of 4-bit sized data elements to generate a plurality of products; and an accumulator to add the plurality of products to a corresponding doubleword-sized source data element of the plurality of doubleword-sized source data elements to generate a corresponding doubleword-sized result data element of the plurality of doubleword-sized result data elements.

    SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION

    公开(公告)号:EP4216057A1

    公开(公告)日:2023-07-26

    申请号:EP23161367.0

    申请日:2017-07-01

    Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, an apparatus comprises an instruction decoder to decode a single instruction, the single instruction having fields to indicate an opcode, a first register to store a first source matrix, a second register to store a second source matrix, and a third register to store a 2 by 2 third source matrix, wherein the opcode is to indicate a matrix multiply-accumulate operation; and execution circuitry to perform the matrix multiply-accumulate operation. The matrix multiply-accumulate operation includes: multiplying a value corresponding to a first row and a first column of the first source matrix and a value corresponding to a first row and a first column of the second source matrix to generate a first product, multiplying a value corresponding to the first row and a second column of the first source matrix and a value corresponding to a second row and the first column of the second source matrix to generate a second product, summing the first product, the second product, and an initial value corresponding to an element position in a first row and a first column of the 2 by 2 third source matrix to generate a resulting value corresponding to the element position in a destination matrix, and storing the destination matrix in the third register.

    BFLOAT16 SCALE AND/OR REDUCE INSTRUCTIONS
    54.
    发明公开

    公开(公告)号:EP4141656A1

    公开(公告)日:2023-03-01

    申请号:EP22185939.0

    申请日:2022-07-20

    Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.

    SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT

    公开(公告)号:EP3916543A3

    公开(公告)日:2021-12-22

    申请号:EP21187080.3

    申请日:2019-06-27

    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor comprises decode circuitry to decode a single instruction into a decoded single instruction and execution circuitry to execute the decoded single instruction according to an opcode. The single instruction has a first field to specify a source matrix, a second field to specify a destination matrix, and the opcode to indicate the execution circuitry is to cause a store of: a first element and a second element from a first column of the source matrix respectively into a first element and a second element in a first row of the destination matrix, a first element and a second element from a second column of the source matrix respectively into a third element and a fourth element in the first row of the destination matrix, a third element and a fourth element from the first column of the source matrix respectively into a first element and a second element in a second row of the destination matrix, and a third element and a fourth element from the second column of the source matrix respectively into a third element and a fourth element in the second row of the destination matrix.

    SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO CONVERT TO 16-BIT FLOATING-POINT FORMAT

    公开(公告)号:EP3822774A1

    公开(公告)日:2021-05-19

    申请号:EP20216494.3

    申请日:2019-10-08

    Abstract: Disclosed embodiments relate to a processor, a system on a chip and a system for executing a format conversion instruction. In one example, a processor having a plurality of cores, including a core that, in response to a format conversion instruction having a first source operand including a first 32-bit single-precision floating point data element, and a second source operand including a second 32-bit single-precision floating point data element, is to: convert the first 32-bit single-precision floating point data element to a first 16-bit floating point data element, wherein, when the first 32-bit single-precision floating point data element is a normal data element, conversion is to be performed according to a rounding mode specified by the format conversion instruction, and the first 16-bit floating point data element is to have a sign bit, an 8-bit exponent, seven explicit mantissa bits, and one implicit mantissa bit, and wherein, when the first 32-bit single-precision floating point data element is a not-a-number, NaN, data element, the first 16-bit floating point data element is to have a mantissa with a most significant bit set to one; convert the second 32-bit single-precision floating point data element to a second 16-bit floating point data element, wherein, when the second 32-bit single-precision floating point data element is a normal data element, conversion is to be performed according to the rounding mode, and the second 16-bit floating point data element is to have a sign bit, an 8-bit exponent, seven explicit mantissa bits, and one implicit mantissa bit, and wherein when the second 32-bit single-precision floating point data element is a NaN data element, the second 16-bit floating point data element is to have a mantissa with a most significant bit set to one; and store the first 16-bit floating point data element in a lower order half of a destination register and the second 16-bit floating point data element in a higher order half of the destination register..

Patent Agency Ranking