-
公开(公告)号:EP4418136A3
公开(公告)日:2024-11-20
申请号:EP24187271.2
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.
-
公开(公告)号:EP4198718A1
公开(公告)日:2023-06-21
申请号:EP23156307.3
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, an apparatus comprises: decode circuitry to decode a single instruction, the single instruction having fields to indicate an opcode, a packed destination operand, a first packed source operand, and a second packed source operand, wherein elements of the destination are 32 bits in size and elements of the first source and the second source are 16 bits in size; a register file having a plurality of packed data registers including registers for the destination and source operands; and execution circuitry, coupled to the decode circuitry. The execution circuitry is to perform operations corresponding to the instruction, including to, for each element position of the destination: multiply a first element from the first source and a first element from the second source to generate a first result, multiply a second element from the first source and a second element from the second source to generate a second result, add the first result and the second result to generate a third result, add the third result to an element from the element position of the destination to generate a fourth result, and store the fourth result in the element position of the destination.
-
公开(公告)号:EP4336369A3
公开(公告)日:2024-06-19
申请号:EP24153968.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
CPC classification number: G06F9/30036 , G06F2212/45520130101 , G06F12/0207 , G06F2212/45420130101 , G06F9/3001 , G06F7/5443 , G06F9/3861 , G06F9/30014 , G06F9/3016
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: add a first data value at that data element position to a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the addition into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP3989062A1
公开(公告)日:2022-04-27
申请号:EP21207387.8
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed signed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed signed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed signed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.
-
公开(公告)号:EP3971709A1
公开(公告)日:2022-03-23
申请号:EP21207379.5
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed unsigned data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed signed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed unsigned data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.
-
公开(公告)号:EP4418136A2
公开(公告)日:2024-08-21
申请号:EP24187271.2
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
IPC: G06F15/76
CPC classification number: G06F15/76 , G06F9/30036 , G06F9/30014 , G06F9/30038 , G06F9/30018
Abstract: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.
-
公开(公告)号:EP4354303A3
公开(公告)日:2024-06-26
申请号:EP24153964.2
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
CPC classification number: G06F9/30036 , G06F2212/45520130101 , G06F12/0207 , G06F2212/45420130101 , G06F9/3001 , G06F7/5443 , G06F9/3861 , G06F9/30014 , G06F9/3016
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: subtract, from a first data value at that data element position, a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the subtraction into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4303724A1
公开(公告)日:2024-01-10
申请号:EP23194771.4
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Adelman, Menachem , Ziv, Barukh , Heinecke, Alexander , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, a processor comprises decode circuitry to decode a single matrix instruction and execution circuitry to execute the single matrix instruction. The single matrix instruction has fields for an opcode, a plurality of identifiers corresponding to a first plurality of 4-bit sized data elements of a first source matrix, a second plurality of 4-bit sized data elements of a second source matrix, a plurality of doubleword- sized source data elements of a third source matrix, and a plurality of doubleword-sized result data elements of a result matrix, and bits indicating whether one or both of the first and second plurality of 4-bit sized data elements are signed or unsigned. The execution circuitry includes a multiply accumulate circuit, comprising: a multiplier to multiply each 4-bit sized data element of a first subset of the first plurality of 4-bit sized data elements with a corresponding 4-bit sized data element of a first subset of the second plurality of 4-bit sized data elements to generate a plurality of products; and an accumulator to add the plurality of products to a corresponding doubleword-sized source data element of the plurality of doubleword-sized source data elements to generate a corresponding doubleword-sized result data element of the plurality of doubleword-sized result data elements.
-
公开(公告)号:EP4216057A1
公开(公告)日:2023-07-26
申请号:EP23161367.0
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Sperber, Zeev , Charney, Mark J. , Toll, Bret L. , Rappoport, Rinat , Shwartsman, Stanislav , Baum, Dan , Yanover, Igor , Ould-Ahmed-Vall, ElMoustapha , Adelman, Menachem , Corbal, Jesus , Gebil, Yuri , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, an apparatus comprises an instruction decoder to decode a single instruction, the single instruction having fields to indicate an opcode, a first register to store a first source matrix, a second register to store a second source matrix, and a third register to store a 2 by 2 third source matrix, wherein the opcode is to indicate a matrix multiply-accumulate operation; and execution circuitry to perform the matrix multiply-accumulate operation. The matrix multiply-accumulate operation includes: multiplying a value corresponding to a first row and a first column of the first source matrix and a value corresponding to a first row and a first column of the second source matrix to generate a first product, multiplying a value corresponding to the first row and a second column of the first source matrix and a value corresponding to a second row and the first column of the second source matrix to generate a second product, summing the first product, the second product, and an initial value corresponding to an element position in a first row and a first column of the 2 by 2 third source matrix to generate a resulting value corresponding to the element position in a destination matrix, and storing the destination matrix in the third register.
-
公开(公告)号:EP4137941A1
公开(公告)日:2023-02-22
申请号:EP22196776.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, Elmoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier, wherein each of the first source matrix operand, the second source matrix operand, and the destination matrix operand corresponds to a two-dimensional matrix of values, and execution circuitry to execute the decoded instruction to, for each data element position of the identified first source matrix operand: multiply a first data value at that data element position by a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the multiplication into a corresponding data element position of the identified destination matrix operand.
-
-
-
-
-
-
-
-
-