-
公开(公告)号:EP4361802A2
公开(公告)日:2024-05-01
申请号:EP24157718.8
申请日:2019-06-26
Applicant: Intel Corporation
Inventor: Toll, Bret , Hughes, Christopher J. , Baum, Dan , Ould-Ahmed-Vall, ElMoustapha , Sade, Raanan , Valentine, Robert , Charney, Mark J. , Heinecke, Alexander F.
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/30032
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor comprises decode circuitry to decode an instruction, the instruction to specify a two-dimensional, 2D, tile storage of the processor, either of multiple rows or multiple columns of the 2D tile storage, multiple vector registers of the processor, and a size of elements of the multiple rows or multiple columns of the 2D tile storage as any one of 8-bits, 16-bits, 32-bits, and 64-bits; and execution circuitry, coupled with the decode circuitry, to perform operations corresponding to the instruction, the operations to include storing elements from each row of the multiple rows or each column of the multiple columns of the 2D tile storage to a corresponding one of the multiple vector registers as a corresponding one-dimensional, 1D, vector.
-
公开(公告)号:EP4303724A1
公开(公告)日:2024-01-10
申请号:EP23194771.4
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, ElMoustapha , Toll, Bret L. , Charney, Mark J. , Adelman, Menachem , Ziv, Barukh , Heinecke, Alexander , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, a processor comprises decode circuitry to decode a single matrix instruction and execution circuitry to execute the single matrix instruction. The single matrix instruction has fields for an opcode, a plurality of identifiers corresponding to a first plurality of 4-bit sized data elements of a first source matrix, a second plurality of 4-bit sized data elements of a second source matrix, a plurality of doubleword- sized source data elements of a third source matrix, and a plurality of doubleword-sized result data elements of a result matrix, and bits indicating whether one or both of the first and second plurality of 4-bit sized data elements are signed or unsigned. The execution circuitry includes a multiply accumulate circuit, comprising: a multiplier to multiply each 4-bit sized data element of a first subset of the first plurality of 4-bit sized data elements with a corresponding 4-bit sized data element of a first subset of the second plurality of 4-bit sized data elements to generate a plurality of products; and an accumulator to add the plurality of products to a corresponding doubleword-sized source data element of the plurality of doubleword-sized source data elements to generate a corresponding doubleword-sized result data element of the plurality of doubleword-sized result data elements.
-
公开(公告)号:EP4216057A1
公开(公告)日:2023-07-26
申请号:EP23161367.0
申请日:2017-07-01
Applicant: INTEL Corporation
Inventor: Valentine, Robert , Sperber, Zeev , Charney, Mark J. , Toll, Bret L. , Rappoport, Rinat , Shwartsman, Stanislav , Baum, Dan , Yanover, Igor , Ould-Ahmed-Vall, ElMoustapha , Adelman, Menachem , Corbal, Jesus , Gebil, Yuri , Rubanovich, Simon
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, an apparatus comprises an instruction decoder to decode a single instruction, the single instruction having fields to indicate an opcode, a first register to store a first source matrix, a second register to store a second source matrix, and a third register to store a 2 by 2 third source matrix, wherein the opcode is to indicate a matrix multiply-accumulate operation; and execution circuitry to perform the matrix multiply-accumulate operation. The matrix multiply-accumulate operation includes: multiplying a value corresponding to a first row and a first column of the first source matrix and a value corresponding to a first row and a first column of the second source matrix to generate a first product, multiplying a value corresponding to the first row and a second column of the first source matrix and a value corresponding to a second row and the first column of the second source matrix to generate a second product, summing the first product, the second product, and an initial value corresponding to an element position in a first row and a first column of the 2 by 2 third source matrix to generate a resulting value corresponding to the element position in a destination matrix, and storing the destination matrix in the third register.
-
公开(公告)号:EP4141656A1
公开(公告)日:2023-03-01
申请号:EP22185939.0
申请日:2022-07-20
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Heinecke, Alexander , Valentine, Robert , Sperber, Zeev , Gradstein, Amit , Charney, Mark , Georganas, Evangelos , Kalamkar, Dhiraj , Hughes, Christopher , Anderson, Cristina
IPC: G06F9/30
Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:EP4137941A1
公开(公告)日:2023-02-22
申请号:EP22196776.3
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Valentine, Robert , Baum, Dan , Sperber, Zeev , Corbal, Jesus , Ould-Ahmed-Vall, Elmoustapha , Toll, Bret L. , Charney, Mark J. , Ziv, Barukh , Heinecke, Alexander , Girkar, Milind , Rubanovich, Simon
Abstract: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier, wherein each of the first source matrix operand, the second source matrix operand, and the destination matrix operand corresponds to a two-dimensional matrix of values, and execution circuitry to execute the decoded instruction to, for each data element position of the identified first source matrix operand: multiply a first data value at that data element position by a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the multiplication into a corresponding data element position of the identified destination matrix operand.
-
56.
公开(公告)号:EP3974967A1
公开(公告)日:2022-03-30
申请号:EP21192634.0
申请日:2021-08-23
Applicant: INTEL Corporation
Inventor: Heinecke, Alexander F , Valentine, Robert , Charney, Mark J , Adelman, Menachem , Hughes, Christopher J , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.
-
公开(公告)号:EP3929736A1
公开(公告)日:2021-12-29
申请号:EP20214433.3
申请日:2020-12-16
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Pollak, Yaroslav , Stupp, Gideon , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark , Hughes, Christopher , Heinecke, Alexander
Abstract: Systems, methods, and apparatuses relating to one or more instructions that utilize direct paths for loading data into a tile from a vector register and/or storing data from a tile into a vector register are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a plurality of registers that represents a two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a coupling to a cache; and a hardware processor core comprising: a vector register, a decoder to decode a single instruction into a decoded single instruction, the single instruction including a first field that identifies the two-dimensional matrix, a second field that identifies a set of elements of the two-dimensional matrix, and a third field that identifies the vector register, and an execution circuit to execute the decoded single instruction to cause a store of the set of elements from the plurality of registers that represents the two-dimensional matrix into the vector register by a coupling of the hardware processor core to the matrix operations accelerator circuit that is separate from the coupling to the cache.
-
公开(公告)号:EP3929733A1
公开(公告)日:2021-12-29
申请号:EP20209948.7
申请日:2020-11-26
Applicant: INTEL Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh
IPC: G06F9/30
Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
-
59.
公开(公告)号:EP3916543A3
公开(公告)日:2021-12-22
申请号:EP21187080.3
申请日:2019-06-27
Applicant: INTEL Corporation
Inventor: Sade, Raanan , Valentine, Robert , Toll, Bret , Hughes, Christopher J. , Heinecke, Alexander F. , Ould-Ahmed-Vall, ElMoustapha , Charney, Mark J.
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor comprises decode circuitry to decode a single instruction into a decoded single instruction and execution circuitry to execute the decoded single instruction according to an opcode. The single instruction has a first field to specify a source matrix, a second field to specify a destination matrix, and the opcode to indicate the execution circuitry is to cause a store of: a first element and a second element from a first column of the source matrix respectively into a first element and a second element in a first row of the destination matrix, a first element and a second element from a second column of the source matrix respectively into a third element and a fourth element in the first row of the destination matrix, a third element and a fourth element from the first column of the source matrix respectively into a first element and a second element in a second row of the destination matrix, and a third element and a fourth element from the second column of the source matrix respectively into a third element and a fourth element in the second row of the destination matrix.
-
60.
公开(公告)号:EP3822774A1
公开(公告)日:2021-05-19
申请号:EP20216494.3
申请日:2019-10-08
Applicant: INTEL Corporation
Inventor: Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Sade, Raanan , Adelman, Menachem , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
Abstract: Disclosed embodiments relate to a processor, a system on a chip and a system for executing a format conversion instruction. In one example, a processor having a plurality of cores, including a core that, in response to a format conversion instruction having a first source operand including a first 32-bit single-precision floating point data element, and a second source operand including a second 32-bit single-precision floating point data element, is to: convert the first 32-bit single-precision floating point data element to a first 16-bit floating point data element, wherein, when the first 32-bit single-precision floating point data element is a normal data element, conversion is to be performed according to a rounding mode specified by the format conversion instruction, and the first 16-bit floating point data element is to have a sign bit, an 8-bit exponent, seven explicit mantissa bits, and one implicit mantissa bit, and wherein, when the first 32-bit single-precision floating point data element is a not-a-number, NaN, data element, the first 16-bit floating point data element is to have a mantissa with a most significant bit set to one; convert the second 32-bit single-precision floating point data element to a second 16-bit floating point data element, wherein, when the second 32-bit single-precision floating point data element is a normal data element, conversion is to be performed according to the rounding mode, and the second 16-bit floating point data element is to have a sign bit, an 8-bit exponent, seven explicit mantissa bits, and one implicit mantissa bit, and wherein when the second 32-bit single-precision floating point data element is a NaN data element, the second 16-bit floating point data element is to have a mantissa with a most significant bit set to one; and store the first 16-bit floating point data element in a lower order half of a destination register and the second 16-bit floating point data element in a higher order half of the destination register..
-
-
-
-
-
-
-
-
-