MATRIX TRANSPOSE AND MULTIPLY
    2.
    发明公开

    公开(公告)号:EP4468146A2

    公开(公告)日:2024-11-27

    申请号:EP24205150.6

    申请日:2020-11-26

    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor comprises: a plurality of registers to store a plurality of packed data elements including a first plurality of packed data elements of a first source matrix tile and a second plurality of packed data elements of a second source matrix tile, the first and second source matrix tiles comprising respective portions of a first source matrix and a second source matrix, and wherein each packed data element of the plurality of packed data elements has an element width; a decoder to decode one or more instructions, at least one instruction of the one or more instructions including an opcode field configured to specify an opcode, a first source operand configured to indicate the first source matrix tile, a second source operand configured to indicate the second source matrix tile, and a destination operand configured to indicate a result matrix tile; and execution circuitry to, in response to the one or more instructions, to transpose the first source matrix tile in accordance with a granularity equal to the element width to generate a first transposed source matrix tile and to multiply the first transposed source matrix tile and the second source matrix tile. The execution circuitry comprises: a plurality of multipliers to multiply data elements of the first transposed source matrix tile and corresponding data elements of the second source matrix tile to produce a corresponding plurality of products; and one or more accumulators to add groups of the products to generate corresponding result data elements in the result matrix tile.

    MATRIX TRANSPOSE AND MULTIPLY
    3.
    发明公开

    公开(公告)号:EP4462249A2

    公开(公告)日:2024-11-13

    申请号:EP24203555.8

    申请日:2020-11-26

    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, an apparatus comprises decode circuitry to decode an instance of an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, a second source operand field to specify a second source matrix location, and a third operand field to specify a source/destination matrix location; and execution circuitry to, in response to the opcode of the decoded instance of the instruction, transpose columns of data element pairs of the first source matrix into rows, perform a dot product of data element pairs of the transposed columns of data element pairs of the first source matrix and corresponding row data element pairs of the second source matrix, add a result of the dot product to a corresponding row data element of the source/destination matrix.

    APPARATUS AND METHOD FOR DOWN-CONVERTING AND INTERLEAVING MULTIPLE FLOATING POINT VALUES

    公开(公告)号:EP3716048A1

    公开(公告)日:2020-09-30

    申请号:EP20155995.2

    申请日:2020-02-07

    Abstract: An apparatus and method down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.

    SYSTEMS, METHODS, AND APPARATUSES FOR DOT PRODUCTION OPERATIONS

    公开(公告)号:EP4053695A1

    公开(公告)日:2022-09-07

    申请号:EP22169888.9

    申请日:2017-07-01

    Abstract: Embodiments detailed herein relate to matrix operations. For example, an apparatus comprises programmable configuration storage, decode circuitry and execution circuitry. The programmable configuration storage is to store configuration information for a first matrix, a second matrix, and a third matrix, the configuration information including a first value corresponding to a first number of rows for the first matrix, a second value corresponding to a second number of columns for the first matrix, a third value corresponding to a third number of rows for the second matrix, a fourth value corresponding to a fourth number of columns for the second matrix, a fifth value corresponding to a fifth number of rows for the third matrix, a sixth value corresponding to the sixth number of columns for the third matrix, and a start row value corresponding to a row of a corresponding matrix at which to restart execution of at least one of a plurality of matrix instructions. The decode circuitry is to decode the plurality of matrix instructions, including a single instruction to perform dot-product and accumulation, the single instruction having a first operand to specify a first register, a second operand to specify a second register, and a third operand to specify a third register. The execution circuitry is to perform one or more operations corresponding to the single instruction, including: performing dot-products on elements of the second matrix from the second register and elements of the third matrix from the third register to generate one or more resulting elements, and accumulating the one or more resulting elements into the first matrix in the first register.

    MATRIX TRANSPOSE AND MULTIPLY
    9.
    发明公开

    公开(公告)号:EP4468146A3

    公开(公告)日:2025-02-19

    申请号:EP24205150.6

    申请日:2020-11-26

    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor comprises: a plurality of registers to store a plurality of packed data elements including a first plurality of packed data elements of a first source matrix tile and a second plurality of packed data elements of a second source matrix tile, the first and second source matrix tiles comprising respective portions of a first source matrix and a second source matrix, and wherein each packed data element of the plurality of packed data elements has an element width; a decoder to decode one or more instructions, at least one instruction of the one or more instructions including an opcode field configured to specify an opcode, a first source operand configured to indicate the first source matrix tile, a second source operand configured to indicate the second source matrix tile, and a destination operand configured to indicate a result matrix tile; and execution circuitry to, in response to the one or more instructions, to transpose the first source matrix tile in accordance with a granularity equal to the element width to generate a first transposed source matrix tile and to multiply the first transposed source matrix tile and the second source matrix tile. The execution circuitry comprises: a plurality of multipliers to multiply data elements of the first transposed source matrix tile and corresponding data elements of the second source matrix tile to produce a corresponding plurality of products; and one or more accumulators to add groups of the products to generate corresponding result data elements in the result matrix tile.

    MATRIX TRANSPOSE AND MULTIPLY
    10.
    发明公开

    公开(公告)号:EP4462249A3

    公开(公告)日:2025-02-19

    申请号:EP24203555.8

    申请日:2020-11-26

    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, an apparatus comprises decode circuitry to decode an instance of an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, a second source operand field to specify a second source matrix location, and a third operand field to specify a source/destination matrix location; and execution circuitry to, in response to the opcode of the decoded instance of the instruction, transpose columns of data element pairs of the first source matrix into rows, perform a dot product of data element pairs of the transposed columns of data element pairs of the first source matrix and corresponding row data element pairs of the second source matrix, add a result of the dot product to a corresponding row data element of the source/destination matrix.

Patent Agency Ranking