-
公开(公告)号:US10942738B2
公开(公告)日:2021-03-09
申请号:US16368973
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Zeev Sperber , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz
Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.
-
公开(公告)号:US12204605B2
公开(公告)日:2025-01-21
申请号:US18360793
申请日:2023-07-27
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Saeed Kharouf , Gavri Berger , Zeev Sperber , Jose Yallouz , Ron Schneider
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
公开(公告)号:US11714875B2
公开(公告)日:2023-08-01
申请号:US16729361
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Saeed Kharouf , Gavri Berger , Zeev Sperber , Jose Yallouz , Ron Schneider
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30036 , G06F9/3851 , G06F9/34
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
公开(公告)号:US20200310794A1
公开(公告)日:2020-10-01
申请号:US16368973
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: ZEEV SPERBER , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz
Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.
-
-
-