SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP4418136A2

    公开(公告)日:2024-08-21

    申请号:EP24187271.2

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F15/76

    摘要: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3971710A1

    公开(公告)日:2022-03-23

    申请号:EP21207389.4

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result; and store the addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP4198718A1

    公开(公告)日:2023-06-21

    申请号:EP23156307.3

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, an apparatus comprises: decode circuitry to decode a single instruction, the single instruction having fields to indicate an opcode, a packed destination operand, a first packed source operand, and a second packed source operand, wherein elements of the destination are 32 bits in size and elements of the first source and the second source are 16 bits in size; a register file having a plurality of packed data registers including registers for the destination and source operands; and execution circuitry, coupled to the decode circuitry. The execution circuitry is to perform operations corresponding to the instruction, including to, for each element position of the destination: multiply a first element from the first source and a first element from the second source to generate a first result, multiply a second element from the first source and a second element from the second source to generate a second result, add the first result and the second result to generate a third result, add the third result to an element from the element position of the destination to generate a fourth result, and store the fourth result in the element position of the destination.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3971711A1

    公开(公告)日:2022-03-23

    申请号:EP21207395.1

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result; and store the addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3989062A1

    公开(公告)日:2022-04-27

    申请号:EP21207387.8

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed signed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed signed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed signed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3971709A1

    公开(公告)日:2022-03-23

    申请号:EP21207379.5

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed unsigned data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed signed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed unsigned data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.