APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20240078283A1

    公开(公告)日:2024-03-07

    申请号:US18360793

    申请日:2023-07-27

    CPC classification number: G06F17/16 G06F9/3001 G06F9/30036 G06F9/3851 G06F9/34

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.

    INSTRUCTIONS AND SUPPORT FOR CALCULATING PREFIX SUMS

    公开(公告)号:US20230409333A1

    公开(公告)日:2023-12-21

    申请号:US17843823

    申请日:2022-06-17

    CPC classification number: G06F9/3818 G06F9/30105

    Abstract: Techniques for performing prefix sums in response to a single instruction are describe are described. In some examples, the single instruction includes fields for an opcode, one or fields to reference a first source operand, one or fields to reference a second source operand, one or fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least: perform a prefix sum by for each non-masked data element position of the second source operand adding a data element of that data element position to each data element of preceding data element positions and adding at least one data element of a defined data element position of the first source operand, and store each prefix sum for each data element position of the second source operand into a corresponding data element position of the destination operand.

    BFLOAT16 SCALE AND/OR REDUCE INSTRUCTIONS

    公开(公告)号:US20230068781A1

    公开(公告)日:2023-03-02

    申请号:US17463382

    申请日:2021-08-31

    Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.

    APPARATUS AND METHOD FOR COMPLEX MATRIX MULTIPLICATION

    公开(公告)号:US20220207107A1

    公开(公告)日:2022-06-30

    申请号:US17133473

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix. The decoder may also decode and the execution circuitry may execute a second complex matrix multiplication instruction to multiply real and imaginary values from the first plurality with corresponding imaginary and real values, respectively, from the second plurality to generate first and second pluralities of imaginary products, and to add corresponding imaginary products to produce a corresponding imaginary value in the result matrix.

    APPARATUS AND METHOD FOR COMPLEX MATRIX CONJUGATE TRANSPOSE

    公开(公告)号:US20220197654A1

    公开(公告)日:2022-06-23

    申请号:US17133400

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix conjugation. For example, one embodiment of a processor comprises: a decoder to decode a complex conjugate transpose instruction including a source operand to identify a complex source matrix and a destination operand to identify a complex result matrix, the complex source matrix to store a first plurality of complex values and the complex result matrix to store a second plurality of complex values, each complex value in the first and second plurality of complex values including a real component and an imaginary component; a plurality of registers or local memory to store all or a subset of the first plurality of complex values; and execution circuitry to execute the complex conjugate transpose instruction using matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values, and transpose hardware logic to perform a matrix transpose operation using the plurality of complex conjugate values to generate a result matrix.

Patent Agency Ranking