-
公开(公告)号:EP4336369A3
公开(公告)日:2024-06-19
申请号:EP24153968.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
CPC classification number: G06F9/30036 , G06F2212/45520130101 , G06F12/0207 , G06F2212/45420130101 , G06F9/3001 , G06F7/5443 , G06F9/3861 , G06F9/30014 , G06F9/3016
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: add a first data value at that data element position to a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the addition into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4468146A2
公开(公告)日:2024-11-27
申请号:EP24205150.6
申请日:2020-11-26
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh
IPC: G06F9/30
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor comprises: a plurality of registers to store a plurality of packed data elements including a first plurality of packed data elements of a first source matrix tile and a second plurality of packed data elements of a second source matrix tile, the first and second source matrix tiles comprising respective portions of a first source matrix and a second source matrix, and wherein each packed data element of the plurality of packed data elements has an element width; a decoder to decode one or more instructions, at least one instruction of the one or more instructions including an opcode field configured to specify an opcode, a first source operand configured to indicate the first source matrix tile, a second source operand configured to indicate the second source matrix tile, and a destination operand configured to indicate a result matrix tile; and execution circuitry to, in response to the one or more instructions, to transpose the first source matrix tile in accordance with a granularity equal to the element width to generate a first transposed source matrix tile and to multiply the first transposed source matrix tile and the second source matrix tile. The execution circuitry comprises: a plurality of multipliers to multiply data elements of the first transposed source matrix tile and corresponding data elements of the second source matrix tile to produce a corresponding plurality of products; and one or more accumulators to add groups of the products to generate corresponding result data elements in the result matrix tile.
-
公开(公告)号:EP4462249A2
公开(公告)日:2024-11-13
申请号:EP24203555.8
申请日:2020-11-26
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh
IPC: G06F9/30
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, an apparatus comprises decode circuitry to decode an instance of an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, a second source operand field to specify a second source matrix location, and a third operand field to specify a source/destination matrix location; and execution circuitry to, in response to the opcode of the decoded instance of the instruction, transpose columns of data element pairs of the first source matrix into rows, perform a dot product of data element pairs of the transposed columns of data element pairs of the first source matrix and corresponding row data element pairs of the second source matrix, add a result of the dot product to a corresponding row data element of the source/destination matrix.
-
4.
公开(公告)号:EP3716048A1
公开(公告)日:2020-09-30
申请号:EP20155995.2
申请日:2020-02-07
Applicant: Intel Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Heinecke, Alexander , Georganas, Evangelos
IPC: G06F9/30
Abstract: An apparatus and method down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.
-
公开(公告)号:EP4354303A2
公开(公告)日:2024-04-17
申请号:EP24153964.2
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
IPC: G06F12/02
CPC classification number: G06F9/30036 , G06F2212/45520130101 , G06F12/0207 , G06F2212/45420130101 , G06F9/3001 , G06F7/5443 , G06F9/3861 , G06F9/30014 , G06F9/3016
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: subtract, from a first data value at that data element position, a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the subtraction into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4336369A2
公开(公告)日:2024-03-13
申请号:EP24153968.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
IPC: G06F12/02
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: add a first data value at that data element position to a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the addition into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4293503A1
公开(公告)日:2023-12-20
申请号:EP23171339.7
申请日:2023-05-03
Applicant: Intel Corporation
Inventor: Adelman, Menachem , Gradstein, Amit , Rubanovich, Simon , Ziv, Barukh , Sherman, Uri , Rip, Dana , Mizrahi, Shahar , Baum, Dan , Rappoport, Rinat , Jain, Nilesh , Sperber, Zeev , Stupp, Gideon , Heinecke, Alexander , Hughes, Christopher , Georganas, Evangelos
IPC: G06F9/30
Abstract: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.
-
公开(公告)号:EP4053695A1
公开(公告)日:2022-09-07
申请号:EP22169888.9
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark , Adelman, Menachem , Ziv, Barukh , Heinecke, Alexander , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, an apparatus comprises programmable configuration storage, decode circuitry and execution circuitry. The programmable configuration storage is to store configuration information for a first matrix, a second matrix, and a third matrix, the configuration information including a first value corresponding to a first number of rows for the first matrix, a second value corresponding to a second number of columns for the first matrix, a third value corresponding to a third number of rows for the second matrix, a fourth value corresponding to a fourth number of columns for the second matrix, a fifth value corresponding to a fifth number of rows for the third matrix, a sixth value corresponding to the sixth number of columns for the third matrix, and a start row value corresponding to a row of a corresponding matrix at which to restart execution of at least one of a plurality of matrix instructions. The decode circuitry is to decode the plurality of matrix instructions, including a single instruction to perform dot-product and accumulation, the single instruction having a first operand to specify a first register, a second operand to specify a second register, and a third operand to specify a third register. The execution circuitry is to perform one or more operations corresponding to the single instruction, including: performing dot-products on elements of the second matrix from the second register and elements of the third matrix from the third register to generate one or more resulting elements, and accumulating the one or more resulting elements into the first matrix in the first register.
-
公开(公告)号:EP4468146A3
公开(公告)日:2025-02-19
申请号:EP24205150.6
申请日:2020-11-26
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh
IPC: G06F9/30
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor comprises: a plurality of registers to store a plurality of packed data elements including a first plurality of packed data elements of a first source matrix tile and a second plurality of packed data elements of a second source matrix tile, the first and second source matrix tiles comprising respective portions of a first source matrix and a second source matrix, and wherein each packed data element of the plurality of packed data elements has an element width; a decoder to decode one or more instructions, at least one instruction of the one or more instructions including an opcode field configured to specify an opcode, a first source operand configured to indicate the first source matrix tile, a second source operand configured to indicate the second source matrix tile, and a destination operand configured to indicate a result matrix tile; and execution circuitry to, in response to the one or more instructions, to transpose the first source matrix tile in accordance with a granularity equal to the element width to generate a first transposed source matrix tile and to multiply the first transposed source matrix tile and the second source matrix tile. The execution circuitry comprises: a plurality of multipliers to multiply data elements of the first transposed source matrix tile and corresponding data elements of the second source matrix tile to produce a corresponding plurality of products; and one or more accumulators to add groups of the products to generate corresponding result data elements in the result matrix tile.
-
公开(公告)号:EP4462249A3
公开(公告)日:2025-02-19
申请号:EP24203555.8
申请日:2020-11-26
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh
IPC: G06F9/30
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, an apparatus comprises decode circuitry to decode an instance of an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, a second source operand field to specify a second source matrix location, and a third operand field to specify a source/destination matrix location; and execution circuitry to, in response to the opcode of the decoded instance of the instruction, transpose columns of data element pairs of the first source matrix into rows, perform a dot product of data element pairs of the transposed columns of data element pairs of the first source matrix and corresponding row data element pairs of the second source matrix, add a result of the dot product to a corresponding row data element of the source/destination matrix.
-
-
-
-
-
-
-
-
-