-
1.
公开(公告)号:EP4398097A3
公开(公告)日:2024-09-04
申请号:EP24177816.6
申请日:2021-09-14
Applicant: INTEL Corporation
Inventor: Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
CPC classification number: G06F9/30014 , G06F9/30036 , G06F9/30025
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processor comprises: a plurality of vector registers to store a plurality of packed data elements including 8-bit floating point data elements and 32-bit floating point data elements; decode circuitry to decode a single matrix multiplication instruction having fields to indicate an opcode and locations of an M by K first source matrix including a first plurality of the 8-bit floating point data elements, a K by N second source matrix including a second plurality of the 8-bit floating point data elements, and an M by N third source matrix having a plurality of 32-bit floating point data elements, each of the first and second plurality of 8-bit floating point data elements comprising a sign bit, a 5-bit exponent value, and a 2-bit mantissa value; and execution circuitry comprising matrix acceleration circuitry to accelerate matrix operations, wherein responsive to the single matrix multiplication instruction, the execution circuitry is to generate each 32-bit floating point result data element of a result matrix based on a corresponding row of the first plurality of 8-bit floating point data elements and a corresponding column of the second plurality of 8-bit floating point data elements, the execution circuitry to generate a respective plurality of products corresponding to the corresponding row of the first plurality of 8-bit floating point data elements and the corresponding column of the second plurality of 8-bit floating point data elements and to accumulate the plurality of products with a corresponding 32-bit floating point data element of the third source matrix to generate the 32-bit floating point result data element of the result matrix.
-
公开(公告)号:EP4318224A1
公开(公告)日:2024-02-07
申请号:EP23182966.4
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Charney, Mark , Georganas, Evangelos , Gradstein, Amit , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev , Valentine, Robert
IPC: G06F9/30
Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
3.
公开(公告)号:EP4276608A3
公开(公告)日:2024-01-10
申请号:EP23195872.9
申请日:2021-09-14
Applicant: Intel Corporation
Inventor: Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processing unit comprises circuitry to perform operations corresponding to an instruction, the instruction to specify a first matrix having M rows by 4*K columns of 8-bit floating-point data elements, a second matrix having 4*K rows by N columns of 8-bit floating-point data elements, and a third matrix having M rows by N columns of 32-bit single precision floating-point data elements. The operations includes to, for each row m of the M rows of the first matrix, and for each column n of the N columns of the second matrix: convert 4*K 8-bit floating-point data elements of the row m of the first matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than an 8-bit floating-point data element, and convert 4*K 8-bit floating-point data elements of the column n of the second matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than the 8-bit floating-point data element; multiply the 4*K higher precision floating-point data elements corresponding to the row m of the first matrix with corresponding ones of the 4*K higher precision floating-point data elements corresponding to the column n of the second matrix to generate 4*K products; accumulate the 4*K products with a 32-bit single precision floating-point data element corresponding to a row m of the M rows, and a column n of the N columns, of the third matrix, to generate a result 32-bit single precision floating-point data element; and store the result 32-bit single precision floating-point data element at the row m and the column n of the third matrix.
-
4.
公开(公告)号:EP4276608A2
公开(公告)日:2023-11-15
申请号:EP23195872.9
申请日:2021-09-14
Applicant: Intel Corporation
Inventor: Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processing unit comprises circuitry to perform operations corresponding to an instruction, the instruction to specify a first matrix having M rows by 4*K columns of 8-bit floating-point data elements, a second matrix having 4*K rows by N columns of 8-bit floating-point data elements, and a third matrix having M rows by N columns of 32-bit single precision floating-point data elements. The operations includes to, for each row m of the M rows of the first matrix, and for each column n of the N columns of the second matrix: convert 4*K 8-bit floating-point data elements of the row m of the first matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than an 8-bit floating-point data element, and convert 4*K 8-bit floating-point data elements of the column n of the second matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than the 8-bit floating-point data element; multiply the 4*K higher precision floating-point data elements corresponding to the row m of the first matrix with corresponding ones of the 4*K higher precision floating-point data elements corresponding to the column n of the second matrix to generate 4*K products; accumulate the 4*K products with a 32-bit single precision floating-point data element corresponding to a row m of the M rows, and a column n of the N columns, of the third matrix, to generate a result 32-bit single precision floating-point data element; and store the result 32-bit single precision floating-point data element at the row m and the column n of the third matrix.
-
公开(公告)号:EP4064040A1
公开(公告)日:2022-09-28
申请号:EP22153430.8
申请日:2022-01-26
Applicant: Intel Corporation
Inventor: Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei
Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
-
6.
公开(公告)号:EP4398097A2
公开(公告)日:2024-07-10
申请号:EP24177816.6
申请日:2021-09-14
Applicant: INTEL Corporation
Inventor: Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
CPC classification number: G06F9/30014 , G06F9/30036 , G06F9/30025
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processor comprises: a plurality of vector registers to store a plurality of packed data elements including 8-bit floating point data elements and 32-bit floating point data elements; decode circuitry to decode a single matrix multiplication instruction having fields to indicate an opcode and locations of an M by K first source matrix including a first plurality of the 8-bit floating point data elements, a K by N second source matrix including a second plurality of the 8-bit floating point data elements, and an M by N third source matrix having a plurality of 32-bit floating point data elements, each of the first and second plurality of 8-bit floating point data elements comprising a sign bit, a 5-bit exponent value, and a 2-bit mantissa value; and execution circuitry comprising matrix acceleration circuitry to accelerate matrix operations, wherein responsive to the single matrix multiplication instruction, the execution circuitry is to generate each 32-bit floating point result data element of a result matrix based on a corresponding row of the first plurality of 8-bit floating point data elements and a corresponding column of the second plurality of 8-bit floating point data elements, the execution circuitry to generate a respective plurality of products corresponding to the corresponding row of the first plurality of 8-bit floating point data elements and the corresponding column of the second plurality of 8-bit floating point data elements and to accumulate the plurality of products with a corresponding 32-bit floating point data element of the third source matrix to generate the 32-bit floating point result data element of the result matrix.
-
公开(公告)号:EP4318225A1
公开(公告)日:2024-02-07
申请号:EP23182973.0
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Georganas, Evangelos , Gradstein, Amit , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev
IPC: G06F9/30
Abstract: Techniques for FP8 classification or manipulation using single instructions are described. An exemplary instruction includes fields for an opcode, an identification of a location of a packed data source operand, an indication of one or more classification checks to perform, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a classification according to the indicated one or more classification checks and store a result of the classification in a corresponding data element position of the destination operand.
-
公开(公告)号:EP4318223A1
公开(公告)日:2024-02-07
申请号:EP23182952.4
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Georganas, Evangelos , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev , Gradstein, Amit
IPC: G06F9/30
Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
9.
公开(公告)号:EP4020169A1
公开(公告)日:2022-06-29
申请号:EP21196474.7
申请日:2021-09-14
Applicant: INTEL Corporation
Inventor: Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:EP4485181A2
公开(公告)日:2025-01-01
申请号:EP24205439.3
申请日:2022-01-26
Applicant: INTEL Corporation
Inventor: Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei
IPC: G06F9/38
Abstract: An apparatus comprises decode circuitry to decode a single matrix instruction having fields to indicate an opcode and locations of a first source matrix including a first plurality of 8-bit floating point data elements encoded in a first 8-bit floating point format, a second source matrix including a second plurality of 8-bit floating point data elements encoded in a second 8-bit floating point format, and a third source matrix including a plurality of 32-bit floating point data elements. The apparatus further comprises execution circuitry, responsive to the single matrix instruction, to generate a plurality of products based on the first plurality of 8-bit floating point data elements of the first source matrix and the second plurality of 8-bit floating point data elements of the second source matrix, and accumulate each product of the plurality of products with a corresponding 32-bit floating point data element of the third source matrix to generate a corresponding 32-bit floating point result data element of a result matrix.
-
-
-
-
-
-
-
-
-