SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP4418136A3

    公开(公告)日:2024-11-20

    申请号:EP24187271.2

    申请日:2016-10-20

    Abstract: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP4198718A1

    公开(公告)日:2023-06-21

    申请号:EP23156307.3

    申请日:2016-10-20

    Abstract: In some embodiments, an apparatus comprises: decode circuitry to decode a single instruction, the single instruction having fields to indicate an opcode, a packed destination operand, a first packed source operand, and a second packed source operand, wherein elements of the destination are 32 bits in size and elements of the first source and the second source are 16 bits in size; a register file having a plurality of packed data registers including registers for the destination and source operands; and execution circuitry, coupled to the decode circuitry. The execution circuitry is to perform operations corresponding to the instruction, including to, for each element position of the destination: multiply a first element from the first source and a first element from the second source to generate a first result, multiply a second element from the first source and a second element from the second source to generate a second result, add the first result and the second result to generate a third result, add the third result to an element from the element position of the destination to generate a fourth result, and store the fourth result in the element position of the destination.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3989062A1

    公开(公告)日:2022-04-27

    申请号:EP21207387.8

    申请日:2016-10-20

    Abstract: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed signed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed signed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed signed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3971709A1

    公开(公告)日:2022-03-23

    申请号:EP21207379.5

    申请日:2016-10-20

    Abstract: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed unsigned data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed signed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed unsigned data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP4418136A2

    公开(公告)日:2024-08-21

    申请号:EP24187271.2

    申请日:2016-10-20

    Abstract: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.

    SYSTEMS, METHODS, AND APPARATUSES FOR DOT PRODUCTION OPERATIONS

    公开(公告)号:EP4303724A1

    公开(公告)日:2024-01-10

    申请号:EP23194771.4

    申请日:2017-07-01

    Abstract: Embodiments detailed herein relate to matrix operations. For example, a processor comprises decode circuitry to decode a single matrix instruction and execution circuitry to execute the single matrix instruction. The single matrix instruction has fields for an opcode, a plurality of identifiers corresponding to a first plurality of 4-bit sized data elements of a first source matrix, a second plurality of 4-bit sized data elements of a second source matrix, a plurality of doubleword- sized source data elements of a third source matrix, and a plurality of doubleword-sized result data elements of a result matrix, and bits indicating whether one or both of the first and second plurality of 4-bit sized data elements are signed or unsigned. The execution circuitry includes a multiply accumulate circuit, comprising: a multiplier to multiply each 4-bit sized data element of a first subset of the first plurality of 4-bit sized data elements with a corresponding 4-bit sized data element of a first subset of the second plurality of 4-bit sized data elements to generate a plurality of products; and an accumulator to add the plurality of products to a corresponding doubleword-sized source data element of the plurality of doubleword-sized source data elements to generate a corresponding doubleword-sized result data element of the plurality of doubleword-sized result data elements.

    SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION

    公开(公告)号:EP4216057A1

    公开(公告)日:2023-07-26

    申请号:EP23161367.0

    申请日:2017-07-01

    Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, an apparatus comprises an instruction decoder to decode a single instruction, the single instruction having fields to indicate an opcode, a first register to store a first source matrix, a second register to store a second source matrix, and a third register to store a 2 by 2 third source matrix, wherein the opcode is to indicate a matrix multiply-accumulate operation; and execution circuitry to perform the matrix multiply-accumulate operation. The matrix multiply-accumulate operation includes: multiplying a value corresponding to a first row and a first column of the first source matrix and a value corresponding to a first row and a first column of the second source matrix to generate a first product, multiplying a value corresponding to the first row and a second column of the first source matrix and a value corresponding to a second row and the first column of the second source matrix to generate a second product, summing the first product, the second product, and an initial value corresponding to an element position in a first row and a first column of the 2 by 2 third source matrix to generate a resulting value corresponding to the element position in a destination matrix, and storing the destination matrix in the third register.

Patent Agency Ranking