SYSTEMS AND METHODS TO TRANSPOSE VECTORS ON-THE-FLY WHILE LOADING FROM MEMORY

    公开(公告)号:EP4375835A3

    公开(公告)日:2024-08-14

    申请号:EP24169357.1

    申请日:2019-10-15

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor comprises: a register file comprising one or more vector registers; a memory interface to read a plurality of data elements from a memory; fetch circuitry to fetch an instruction; decode circuitry to decode the instruction, and execution circuitry to execute the instruction. The instruction includes a plurality of fields to indicate an opcode, a subset of the plurality of data elements to be broadcast, and locations of the plurality of data elements, the plurality of data elements arranged in a corresponding plurality of relative positions, wherein the plurality of data elements include a first group of data elements and a second group of data elements. The execution circuitry performs a permute operation and a broadcast operation in accordance with the instruction, wherein the broadcast operation is to cause the subset of the plurality of data elements to be broadcast to a plurality of the relative positions associated with a corresponding plurality of other subsets of the plurality of data elements, the subset of the plurality of data elements to replace the other corresponding subsets at the plurality of relative positions.

    SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

    公开(公告)号:EP4276609A3

    公开(公告)日:2024-02-14

    申请号:EP23200278.2

    申请日:2019-10-08

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processing unit comprises fetch circuitry to fetch an instruction, decode circuitry to decode the instruction, the instruction having a first field to specify a first storage location of a plurality of data elements corresponding to a first matrix having M rows by N columns of 32-bit single precision floating-point data elements, a second field to specify a second storage location of a plurality of data elements corresponding to a second matrix having M rows by K columns of pairs of 16-bit floating-point data elements having a bfloat16 format, and a third field to specify a third storage location of a plurality of data elements corresponding to a third matrix having K rows by N columns of pairs of 16-bit floating-point data elements having the bfloat16 format, and execution circuitry coupled with the decode circuitry, the execution circuitry to perform operations corresponding to the instruction.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    4.
    发明公开

    公开(公告)号:EP4318229A1

    公开(公告)日:2024-02-07

    申请号:EP23189559.0

    申请日:2023-08-03

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.

    CONVERSION INSTRUCTIONS
    5.
    发明公开

    公开(公告)号:EP4202659A1

    公开(公告)日:2023-06-28

    申请号:EP22210978.7

    申请日:2022-12-02

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Techniques for data type conversion are described. An example uses an instruction that is to include fields for an opcode, an identification of source operand location, and an identification of destination operand location, wherein the opcode is to indicate instruction processing circuitry is to convert a 16-bit floating-point value from the identified source operand location into a 32-bit floating point value and store that 32-bit floating point value in one or more data element positions of the identified destination operand.

    INSTRUCTIONS TO CONVERT FROM FP16 TO BF8
    7.
    发明公开

    公开(公告)号:EP4020178A1

    公开(公告)日:2022-06-29

    申请号:EP21198429.9

    申请日:2021-09-23

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: Techniques for converting FP16 to BF8 using bias are described. An exemplary embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3989062A1

    公开(公告)日:2022-04-27

    申请号:EP21207387.8

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed signed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed signed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed signed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD

    公开(公告)号:EP3971709A1

    公开(公告)日:2022-03-23

    申请号:EP21207379.5

    申请日:2016-10-20

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F15/76

    摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed unsigned data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed signed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed unsigned data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.