-
公开(公告)号:US12182571B2
公开(公告)日:2024-12-31
申请号:US18100194
申请日:2023-01-23
Applicant: Intel Corporation
Inventor: Robert Valentine , Menachem Adelman , Milind B. Girkar , Zeev Sperber , Mark J. Charney , Bret L. Toll , Rinat Rappoport , Jesus Corbal , Stanislav Shwartsman , Dan Baum , Igor Yanover , Alexander F. Heinecke , Barukh Ziv , Elmoustapha Ould-Ahmed-Vall , Yuri Gebil , Raanan Sade
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
-
公开(公告)号:US11567765B2
公开(公告)日:2023-01-31
申请号:US16487766
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Menachem Adelman , Milind B. Girkar , Zeev Sperber , Mark J. Charney , Bret L. Toll , Rinat Rappoport , Jesus Corbal , Stanislav Shwartsman , Dan Baum , Igor Yanover , Alexander F. Heinecke , Barukh Ziv , Elmoustapha Ould-Ahmed-Vall , Yuri Gebil
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
-
公开(公告)号:US20240329938A1
公开(公告)日:2024-10-03
申请号:US18607024
申请日:2024-03-15
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham
CPC classification number: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
-
公开(公告)号:US20210406018A1
公开(公告)日:2021-12-30
申请号:US16914347
申请日:2020-06-27
Applicant: INTEL CORPORATION
Inventor: Menachem Adelman , Robert Valentine , Barukh Ziv , Yaroslav Pollak , Gideon Stupp , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark Charney , Christopher Hughes , Alexander Heinecke
Abstract: Systems, methods, and apparatuses relating to one or more instructions that utilize direct paths for loading data into a tile from a vector register and/or storing data from a tile into a vector register are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a plurality of registers that represents a two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a coupling to a cache; and a hardware processor core comprising: a vector register, a decoder to decode a single instruction into a decoded single instruction, the single instruction including a first field that identifies the two-dimensional matrix, a second field that identifies a set of elements of the two-dimensional matrix, and a third field that identifies the vector register, and an execution circuit to execute the decoded single instruction to cause a store of the set of elements from the plurality of registers that represents the two-dimensional matrix into the vector register by a coupling of the hardware processor core to the matrix operations accelerator circuit that is separate from the coupling to the cache.
-
5.
公开(公告)号:US20240045685A1
公开(公告)日:2024-02-08
申请号:US17958381
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Menachem Adelman , Amit Gradstein , Alexander Heinecke , Christopher Hughes , Naveen Mellempudi , Shahar Mizrahi , Dana Rip , Simon Rubanovich , Uri Sherman , Guy Boudoukh , Evangelos Georganas , Nilesh Jain , Barukh Ziv
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30025 , G06F9/3001
Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of FP8 data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.
-
公开(公告)号:US11360770B2
公开(公告)日:2022-06-14
申请号:US16487784
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Menachem Adelman , Zeev Sperber , Mark J. Charney , Bret L. Toll , Jesus Corbal , Alexander F. Heinecke , Barukh Ziv , Elmoustapha Ould-Ahmed-Vall , Stanislav Shwartsman
Abstract: Embodiments detailed herein relate to matrix operations. In particular, performing a matrix operation of zeroing a matrix in response to a single instruction. For example, a processor detailed which includes decode circuitry to decode an instruction having fields for an opcode and a source/destination matrix operand identifier; and execution circuitry to execute the decoded instruction to zero each data element of the identified source/destination matrix.
-
公开(公告)号:US11163565B2
公开(公告)日:2021-11-02
申请号:US16486960
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Dan Baum , Zeev Sperber , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Bret L. Toll , Mark J. Charney , Menachem Adelman , Barukh Ziv , Alexander Heinecke , Simon Rubanovich
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
公开(公告)号:US11972230B2
公开(公告)日:2024-04-30
申请号:US16914318
申请日:2020-06-27
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham
CPC classification number: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
-
公开(公告)号:US20230409326A1
公开(公告)日:2023-12-21
申请号:US17841558
申请日:2022-06-15
Applicant: Intel Corporation
Inventor: Menachem Adelman , Amit Gradstein , Simon Rubanovich , Barukh Ziv , Uri Sherman , Dana Rip , Shahar Mizrahi , Dan Baum , Rinat Rappoport , Nilesh Jain , Zeev Sperber , Gideon Stupp , Alexander Heinecke , Christopher Hughes , Evangelos Georganas
CPC classification number: G06F9/30145 , G06F9/30178 , G06F9/30047 , G06F9/3887 , G06N3/04
Abstract: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.
-
公开(公告)号:US11288069B2
公开(公告)日:2022-03-29
申请号:US16487755
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Menachem Adelman , Elmoustapha Ould-Ahmed-Vall , Bret L. Toll , Milind B. Girkar , Zeev Sperber , Mark J. Charney , Rinat Rappoport , Jesus Corbal , Stanislav Shwartsman , Igor Yanover , Alexander F. Heinecke , Barukh Ziv , Dan Baum , Yuri Gebil
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
-
-
-
-
-
-
-
-