-
公开(公告)号:US20210349720A1
公开(公告)日:2021-11-11
申请号:US17382917
申请日:2021-07-22
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
-
公开(公告)号:US20200241873A1
公开(公告)日:2020-07-30
申请号:US16487784
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Alexander F. HEINECKE , Barukh ZIV , Elmoustapha OULD-AHMED-VALL , Stanislav SHWARTSMAN
Abstract: Embodiments detailed herein relate to matrix operations. In particular, performing a matrix operation of zeroing a matrix in response to a single instruction. For example, a processor detailed which includes decode circuitry to decode an instruction having fields for an opcode and a source/destination matrix operand identifier; and execution circuitry to execute the decoded instruction to zero each data element of the identified source/destination matrix.
-
公开(公告)号:US20200233667A1
公开(公告)日:2020-07-23
申请号:US16487787
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
-
公开(公告)号:US20240078283A1
公开(公告)日:2024-03-07
申请号:US18360793
申请日:2023-07-27
Applicant: Intel Corporation
Inventor: Amit GRADSTEIN , Simon RUBANOVICH , Sagi MELLER , Saeed KHAROUF , Gavri BERGER , Zeev SPERBER , Jose YALLOUZ , Ron SCHNEIDER
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30036 , G06F9/3851 , G06F9/34
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
公开(公告)号:US20230072105A1
公开(公告)日:2023-03-09
申请号:US17463410
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Alexander HEINECKE , Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
IPC: G06F9/30
Abstract: Techniques for comparing BF16 data elements are described. An exemplary BF16 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
公开(公告)号:US20230068781A1
公开(公告)日:2023-03-02
申请号:US17463382
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Alexander HEINECKE , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
IPC: G06F9/30
Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20220291927A1
公开(公告)日:2022-09-15
申请号:US17706428
申请日:2022-03-28
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Rinat RAPPOPORT , Jesus CORBAL , Stanislav SHWARTSMAN , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Dan BAUM , Yuri GEBIL
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
公开(公告)号:US20220058021A1
公开(公告)日:2022-02-24
申请号:US17516023
申请日:2021-11-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
公开(公告)号:US20210132943A1
公开(公告)日:2021-05-06
申请号:US16486960
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
10.
公开(公告)号:US20210124580A1
公开(公告)日:2021-04-29
申请号:US17133078
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
-
-
-
-
-
-
-
-