-
公开(公告)号:EP4354303A2
公开(公告)日:2024-04-17
申请号:EP24153964.2
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
IPC: G06F12/02
CPC classification number: G06F9/30036 , G06F2212/45520130101 , G06F12/0207 , G06F2212/45420130101 , G06F9/3001 , G06F7/5443 , G06F9/3861 , G06F9/30014 , G06F9/3016
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: subtract, from a first data value at that data element position, a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the subtraction into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4336369A2
公开(公告)日:2024-03-13
申请号:EP24153968.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
IPC: G06F12/02
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, for identifying a first plurality of source vectors, for identifying a second plurality of source vectors, and for identifying a plurality of destination vectors; and execution circuitry to execute the decoded instruction to, for each data element position of each of the identified first plurality of source vectors: add a first data value at that data element position to a second data value at a corresponding data element position of a corresponding one of the identified second plurality of source vectors, and store a result of the addition into a corresponding data element position of a corresponding one of the identified plurality of destination vectors.
-
公开(公告)号:EP4250101A2
公开(公告)日:2023-09-27
申请号:EP23191570.3
申请日:2011-09-30
Applicant: Intel Corporation
Inventor: Valentine, Robert C. , Corbal San Adrian, Jesus , Sans Espasa, Roger , Cavin, Robert D. , Toll, Bret L. , Duran Galan, Santiago , Wiedemeier, Jeffrey G. , Samudrala, Sridhar , Girkar, Milind Baburao , Grochowski, Edward Thomas , Hall, Jonathan Cannon , Bradford, Dennis R. , Ould-Ahmed-Vall, ElMoustapha , Abel, James C. , Charney, Mark , Abraham, Seth , Sair, Suleyman , Forsyth, Andrew Thomas , Yount, Charles , Wu, Lisa
IPC: G06F9/30
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is to execute an instruction set. The instruction set includes a first instruction format, wherein the first instruction format includes a first plurality of templates, wherein the first instruction format has a plurality of fields including a base operation field, a data element width field, and a write mask field, wherein the first instruction format supports, through different values in the base operation field, specification of different vector operations, wherein each of the vector operations is to generate a destination vector operand including a plurality of data elements at different data element positions, wherein the first instruction format supports, through different values in the data element width field, specification of different data element widths, wherein the base operation field, the data element width field, and the write mask field may each store only one value on each occurrence of an instruction in the first instruction format in instruction streams. The processor includes a decode unit to decode the occurrences of the instructions in the first plurality of templates, including to: distinguish, for each of the occurrences, which one of the data element widths to use based on a value in the data element width field; and distinguish, for each of the occurrences, the data elements resulting from the occurrence's vector operation to be reflected in the destination vector operand's corresponding data element positions based on the write mask field's content and the data element width for the occurrence. Different values that may be stored in the write mask field distinguish different write mask registers, of a set of write mask registers, that are to store configurable write masks. The data element width for the occurrence distinguishes which of the data element positions of the destination vector operand correspond with which bits of the configurable write masks.
-
公开(公告)号:EP4148563A1
公开(公告)日:2023-03-15
申请号:EP22203441.5
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, an apparatus with execution circuitry is provided. The execution circuitry is to execute a single instruction to, for each result packed data element: preserve an existing value of the result packed data element or set the result packed data element to zero if a corresponding bit value in a writemask register is set to a first value; and if the corresponding bit value in the writemask register is set to a second value, then: multiply a first number of a first source packed data elements with corresponding packed data elements of a second source packed data elements to produce a first number of products, add the first number of products to a corresponding packed data element from a third source packed data elements to produce the result packed data element of a second size in a corresponding position in a source/destination packed data register.
-
公开(公告)号:EP4053695A1
公开(公告)日:2022-09-07
申请号:EP22169888.9
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark , Adelman, Menachem , Ziv, Barukh , Heinecke, Alexander , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, an apparatus comprises programmable configuration storage, decode circuitry and execution circuitry. The programmable configuration storage is to store configuration information for a first matrix, a second matrix, and a third matrix, the configuration information including a first value corresponding to a first number of rows for the first matrix, a second value corresponding to a second number of columns for the first matrix, a third value corresponding to a third number of rows for the second matrix, a fourth value corresponding to a fourth number of columns for the second matrix, a fifth value corresponding to a fifth number of rows for the third matrix, a sixth value corresponding to the sixth number of columns for the third matrix, and a start row value corresponding to a row of a corresponding matrix at which to restart execution of at least one of a plurality of matrix instructions. The decode circuitry is to decode the plurality of matrix instructions, including a single instruction to perform dot-product and accumulation, the single instruction having a first operand to specify a first register, a second operand to specify a second register, and a third operand to specify a third register. The execution circuitry is to perform one or more operations corresponding to the single instruction, including: performing dot-products on elements of the second matrix from the second register and elements of the third matrix from the third register to generate one or more resulting elements, and accumulating the one or more resulting elements into the first matrix in the first register.
-
公开(公告)号:EP3971711A1
公开(公告)日:2022-03-23
申请号:EP21207395.1
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result; and store the addition result in the corresponding packed data element position of the packed data source/destination operand.
-
7.
公开(公告)号:EP3916543A2
公开(公告)日:2021-12-01
申请号:EP21187080.3
申请日:2019-06-27
Applicant: INTEL Corporation
Inventor: Sade, Raanan , Valentine, Robert , Toll, Bret , Hughes, Christopher J. , Heinecke, Alexander F. , Ould-Ahmed-Vall, ElMoustapha , Charney, Mark J.
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor comprises decode circuitry to decode a single instruction into a decoded single instruction and execution circuitry to execute the decoded single instruction according to an opcode. The single instruction has a first field to specify a source matrix, a second field to specify a destination matrix, and the opcode to indicate the execution circuitry is to cause a store of: a first element and a second element from a first column of the source matrix respectively into a first element and a second element in a first row of the destination matrix, a first element and a second element from a second column of the source matrix respectively into a third element and a fourth element in the first row of the destination matrix, a third element and a fourth element from the first column of the source matrix respectively into a first element and a second element in a second row of the destination matrix, and a third element and a fourth element from the second column of the source matrix respectively into a third element and a fourth element in the second row of the destination matrix.
-
公开(公告)号:EP4418136A3
公开(公告)日:2024-11-20
申请号:EP24187271.2
申请日:2016-10-20
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
Abstract: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.
-
9.
公开(公告)号:EP4290371A3
公开(公告)日:2024-03-13
申请号:EP23205691.1
申请日:2019-06-27
Applicant: Intel Corporation
Inventor: Sade, Raanan , Valentine, Robert , Toll, Bret , Hughes, Christopher J. , Heinecke, Alexander F. , Ould-Ahmed-Vall, ElMoustapha , Charney, Mark J.
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, an apparatus comprises: a plurality of registers, each register of the plurality of registers to store a plurality of matrix data elements and matrix processing circuitry to execute a matrix processing instruction to multiply a first source tile of a first matrix and a second source tile of a second matrix, the first source tile comprising rows and columns of a first subset of data elements of the first source matrix and the second source tile comprising rows and columns of a second subset of data elements of the second source matrix. The matrix processing circuitry comprises: circuitry to transform the first source tile by merging adjacent pairs of rows of the first source tile to generate corresponding row-interleaved data element sequences, each row-interleaved data element sequence to be loaded in a corresponding register of the plurality of registers; a set of multipliers to perform a parallel multiplication of each data element of the first subset of data elements stored in the corresponding registers of the plurality of registers with a corresponding data element of the second subset of data elements to generate a corresponding plurality of products; and accumulator circuitry to add the plurality of products to corresponding accumulated data elements of an accumulation matrix to generate corresponding result data elements of a result matrix.
-
10.
公开(公告)号:EP4290371A2
公开(公告)日:2023-12-13
申请号:EP23205691.1
申请日:2019-06-27
Applicant: Intel Corporation
Inventor: Sade, Raanan , Valentine, Robert , Toll, Bret , Hughes, Christopher J. , Heinecke, Alexander F. , Ould-Ahmed-Vall, ElMoustapha , Charney, Mark J.
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, an apparatus comprises: a plurality of registers, each register of the plurality of registers to store a plurality of matrix data elements and matrix processing circuitry to execute a matrix processing instruction to multiply a first source tile of a first matrix and a second source tile of a second matrix, the first source tile comprising rows and columns of a first subset of data elements of the first source matrix and the second source tile comprising rows and columns of a second subset of data elements of the second source matrix. The matrix processing circuitry comprises: circuitry to transform the first source tile by merging adjacent pairs of rows of the first source tile to generate corresponding row-interleaved data element sequences, each row-interleaved data element sequence to be loaded in a corresponding register of the plurality of registers; a set of multipliers to perform a parallel multiplication of each data element of the first subset of data elements stored in the corresponding registers of the plurality of registers with a corresponding data element of the second subset of data elements to generate a corresponding plurality of products; and accumulator circuitry to add the plurality of products to corresponding accumulated data elements of an accumulation matrix to generate corresponding result data elements of a result matrix.
-
-
-
-
-
-
-
-
-