Accelerator systems and methods for matrix operations

    公开(公告)号:US10942738B2

    公开(公告)日:2021-03-09

    申请号:US16368973

    申请日:2019-03-29

    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

    Apparatuses, methods, and systems for instructions of a matrix operations accelerator

    公开(公告)号:US12204605B2

    公开(公告)日:2025-01-21

    申请号:US18360793

    申请日:2023-07-27

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.

    APPARATUSES, METHODS, AND SYSTEMS FOR TRANSPOSE INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20200310803A1

    公开(公告)日:2020-10-01

    申请号:US16370894

    申请日:2019-03-30

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

    Apparatus and method for power virus protection in a processor

    公开(公告)号:US12099597B2

    公开(公告)日:2024-09-24

    申请号:US18384996

    申请日:2023-10-30

    Abstract: An apparatus and method for intelligent power virus protection in a processor. For example, one embodiment of a processor comprises: first circuitry including an instruction fetch circuit to fetch instructions, each instruction comprising an instruction type and an associated width comprising a number of bits associated with source and/or destination operand values associated with the instruction; detection circuitry to detect one or more instructions of a particular type and/or width; evaluation circuitry to evaluate an impact of power virus protection (PVP) circuitry when executing the one or more instructions based on the detected instruction types and/or widths; and control circuitry, based on the evaluation, to configure the PVP circuitry in accordance with the evaluation performed by the evaluation circuitry.

    Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator

    公开(公告)号:US10990397B2

    公开(公告)日:2021-04-27

    申请号:US16370894

    申请日:2019-03-30

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

    Apparatus and method for power virus protection in a processor

    公开(公告)号:US11809549B2

    公开(公告)日:2023-11-07

    申请号:US16728843

    申请日:2019-12-27

    Abstract: An apparatus and method for intelligent power virus protection in a processor. For example, one embodiment of a processor comprises: first circuitry including an instruction fetch circuit to fetch instructions, each instruction comprising an instruction type and an associated width comprising a number of bits associated with source and/or destination operand values associated with the instruction; detection circuitry to detect one or more instructions of a particular type and/or width; evaluation circuitry to evaluate an impact of power virus protection (PVP) circuitry when executing the one or more instructions based on the detected instruction types and/or widths; and control circuitry, based on the evaluation, to configure the PVP circuitry in accordance with the evaluation performed by the evaluation circuitry.

    Apparatuses, methods, and systems for instructions of a matrix operations accelerator

    公开(公告)号:US11714875B2

    公开(公告)日:2023-08-01

    申请号:US16729361

    申请日:2019-12-28

    CPC classification number: G06F17/16 G06F9/3001 G06F9/30036 G06F9/3851 G06F9/34

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.

    ACCELERATOR SYSTEMS AND METHODS FOR MATRIX OPERATIONS

    公开(公告)号:US20200310794A1

    公开(公告)日:2020-10-01

    申请号:US16368973

    申请日:2019-03-29

    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

Patent Agency Ranking