-
公开(公告)号:EP4318225A1
公开(公告)日:2024-02-07
申请号:EP23182973.0
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Georganas, Evangelos , Gradstein, Amit , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev
IPC: G06F9/30
Abstract: Techniques for FP8 classification or manipulation using single instructions are described. An exemplary instruction includes fields for an opcode, an identification of a location of a packed data source operand, an indication of one or more classification checks to perform, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a classification according to the indicated one or more classification checks and store a result of the classification in a corresponding data element position of the destination operand.
-
公开(公告)号:EP4318223A1
公开(公告)日:2024-02-07
申请号:EP23182952.4
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Georganas, Evangelos , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev , Gradstein, Amit
IPC: G06F9/30
Abstract: Techniques for comparing FP8 data elements are described. An exemplary FP8 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
公开(公告)号:EP4141655A1
公开(公告)日:2023-03-01
申请号:EP22183762.8
申请日:2022-07-08
Applicant: INTEL Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Valentine, Robert , Sperber, Zeev , Gradstein, Amit , Charney, Mark , Georganas, Evangelos , Kalamkar, Dhiraj , Hughes, Christopher , Anderson, Cristina
IPC: G06F9/30
Abstract: Techniques for comparing BF16 data elements are described. An exemplary BF16 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
4.
公开(公告)号:EP3812883A1
公开(公告)日:2021-04-28
申请号:EP20215256.7
申请日:2019-06-25
Applicant: INTEL Corporation
Inventor: Henry, Gregory , Heinecke, Alexander
Abstract: Embodiments detailed herein relate to instructions to perform a matrix multiplication. An exemplary processor comprises a cache to store data, and a plurality of cores coupled to the cache. At least one core of the plurality of cores comprises execution circuitry to execute one or more instructions to perform a matrix multiplication with a first source matrix and a second source matrix to generate a result matrix. The execution circuitry is to convert a first plurality of data elements of the first source matrix and a second plurality of data elements of the second source matrix from a single-precision floating point data format to a reduced precision floating point format having fewer mantissa bits than the single-precision floating point format and a same number of exponent bits as the single-precision floating point format; and perform a plurality of parallel fused multiply-add operations to multiply the first plurality of data elements in the reduced precision floating point format by corresponding data elements of the second plurality of data elements in the reduced precision floating point format to generate a plurality of products, and to add the plurality of products to accumulated values to generate single-precision floating point data elements of the result matrix.
-
公开(公告)号:EP3798823A1
公开(公告)日:2021-03-31
申请号:EP20178989.8
申请日:2020-06-09
Applicant: Intel Corporation
Inventor: Pillai, Kamlesh R. , Hughes, Christopher J. , Heinecke, Alexander
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable from a first mode where a respective output of each of a first proper subset of fused multiply accumulate circuits of the two-dimensional grid is transmitted downstream to a respective input of each of a second proper subset of fused multiply accumulate circuits of the two-dimensional grid to form output values from at least one first input two-dimensional matrix and at least one second input two-dimensional matrix, and store the output values in resultant storage, to a second mode where the respective output of each of the first proper subset of fused multiply accumulate circuits of the two-dimensional grid form first output values from a first subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the first output values in the resultant storage, and a respective output of each of the second proper subset of fused multiply accumulate circuits of the two-dimensional grid form second output values from a second subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the second output values in the resultant storage.
-
6.
公开(公告)号:EP3716048A1
公开(公告)日:2020-09-30
申请号:EP20155995.2
申请日:2020-02-07
Applicant: Intel Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Heinecke, Alexander , Georganas, Evangelos
IPC: G06F9/30
Abstract: An apparatus and method down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.
-
7.
公开(公告)号:EP4321992A3
公开(公告)日:2024-05-01
申请号:EP23210931.4
申请日:2020-02-07
Applicant: Intel Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Heinecke, Alexander , Georganas, Evangelos
IPC: G06F9/30
CPC classification number: G06F7/483 , G06F2207/382820130101 , G06F9/30032 , G06F9/30036 , G06F9/30025
Abstract: An apparatus and method for down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.
-
公开(公告)号:EP4357914A2
公开(公告)日:2024-04-24
申请号:EP24161663.0
申请日:2022-07-08
Applicant: INTEL Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Valentine, Robert , Sperber, Zeev , Gradstein, Amit , Charney, Mark , Georganas, Evangelos , Kalamkar, Dhiraj , Hughes, Christopher , Anderson, Cristina
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/30021 , G06F9/30094
Abstract: Techniques for comparing BF16 data elements are described. An exemplary instruction is to cause operations including to: provide, for each data element position of BF16 data elements of first and second packed data source operands, a data element result, wherein: for a predicate value that is a first value, the data element result is to include a corresponding data element that is a result of either a maximum comparison or a minimum comparison of a pair of corresponding BF16 data elements, wherein, when the BF16 data elements of the pair of corresponding BF16 data elements are both zero, of either sign, the data element result is to include the corresponding BF16 data element of the second packed data source operand; and for a predicate value that is a second value, the data element result is to include a corresponding data element that is either zero or remains unchanged.
-
9.
公开(公告)号:EP4321992A2
公开(公告)日:2024-02-14
申请号:EP23210931.4
申请日:2020-02-07
Applicant: Intel Corporation
Inventor: Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Heinecke, Alexander , Georganas, Evangelos
IPC: G06F9/30
Abstract: An apparatus and method for down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.
-
公开(公告)号:EP4318224A1
公开(公告)日:2024-02-07
申请号:EP23182966.4
申请日:2023-07-03
Applicant: Intel Corporation
Inventor: Heinecke, Alexander , Adelman, Menachem , Charney, Mark , Georganas, Evangelos , Gradstein, Amit , Hughes, Christopher , Mellempudi, Naveen , Rubanovich, Simon , Sherman, Uri , Sperber, Zeev , Valentine, Robert
IPC: G06F9/30
Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
-
-
-
-
-
-
-
-