-
公开(公告)号:US10942738B2
公开(公告)日:2021-03-09
申请号:US16368973
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Zeev Sperber , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz
Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.
-
公开(公告)号:US12204605B2
公开(公告)日:2025-01-21
申请号:US18360793
申请日:2023-07-27
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Saeed Kharouf , Gavri Berger , Zeev Sperber , Jose Yallouz , Ron Schneider
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
3.
公开(公告)号:US20200310803A1
公开(公告)日:2020-10-01
申请号:US16370894
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Zeev Sperber , Jose Yallouz , Robert Valentine
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.
-
公开(公告)号:US12099597B2
公开(公告)日:2024-09-24
申请号:US18384996
申请日:2023-10-30
Applicant: Intel Corporation
Inventor: Alexander Gendler , Sagi Meller , Gavri Berger , Igor Yanover
CPC classification number: G06F21/54 , G06F1/32 , G06F9/3802 , G06F21/561 , G06F2221/034
Abstract: An apparatus and method for intelligent power virus protection in a processor. For example, one embodiment of a processor comprises: first circuitry including an instruction fetch circuit to fetch instructions, each instruction comprising an instruction type and an associated width comprising a number of bits associated with source and/or destination operand values associated with the instruction; detection circuitry to detect one or more instructions of a particular type and/or width; evaluation circuitry to evaluate an impact of power virus protection (PVP) circuitry when executing the one or more instructions based on the detected instruction types and/or widths; and control circuitry, based on the evaluation, to configure the PVP circuitry in accordance with the evaluation performed by the evaluation circuitry.
-
5.
公开(公告)号:US10990397B2
公开(公告)日:2021-04-27
申请号:US16370894
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Zeev Sperber , Jose Yallouz , Robert Valentine
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.
-
公开(公告)号:US11809549B2
公开(公告)日:2023-11-07
申请号:US16728843
申请日:2019-12-27
Applicant: Intel Corporation
Inventor: Alexander Gendler , Sagi Meller , Gavri Berger , Igor Yanover
CPC classification number: G06F21/54 , G06F1/32 , G06F9/3802 , G06F21/561 , G06F2221/034
Abstract: An apparatus and method for intelligent power virus protection in a processor. For example, one embodiment of a processor comprises: first circuitry including an instruction fetch circuit to fetch instructions, each instruction comprising an instruction type and an associated width comprising a number of bits associated with source and/or destination operand values associated with the instruction; detection circuitry to detect one or more instructions of a particular type and/or width; evaluation circuitry to evaluate an impact of power virus protection (PVP) circuitry when executing the one or more instructions based on the detected instruction types and/or widths; and control circuitry, based on the evaluation, to configure the PVP circuitry in accordance with the evaluation performed by the evaluation circuitry.
-
公开(公告)号:US11714875B2
公开(公告)日:2023-08-01
申请号:US16729361
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Amit Gradstein , Simon Rubanovich , Sagi Meller , Saeed Kharouf , Gavri Berger , Zeev Sperber , Jose Yallouz , Ron Schneider
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30036 , G06F9/3851 , G06F9/34
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
公开(公告)号:US20200310794A1
公开(公告)日:2020-10-01
申请号:US16368973
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: ZEEV SPERBER , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz
Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.
-
-
-
-
-
-
-