-
公开(公告)号:US20240248720A1
公开(公告)日:2024-07-25
申请号:US18627907
申请日:2024-04-05
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich
CPC classification number: G06F9/30145 , G06F7/49947 , G06F9/30025 , G06F9/30036 , H03M7/24
Abstract: Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
公开(公告)号:US20240143328A1
公开(公告)日:2024-05-02
申请号:US18386407
申请日:2023-11-02
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30043
Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US20240143325A1
公开(公告)日:2024-05-02
申请号:US18386771
申请日:2023-11-03
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
CPC classification number: G06F9/30036 , G06F9/30101 , G06F17/16
Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
-
公开(公告)号:US11941395B2
公开(公告)日:2024-03-26
申请号:US17134008
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Menachem Adelman , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich
CPC classification number: G06F9/3001 , G06F7/5443 , G06F9/30145 , G06F9/3802 , G06F17/16 , G06N3/08
Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
5.
公开(公告)号:US20240045691A1
公开(公告)日:2024-02-08
申请号:US17958374
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Naveen Mellempudi , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Alexander Heinecke , Simon Rubanovich , Uri Sherman , Zeev Sperber
CPC classification number: G06F9/3802 , G06F7/4876 , G06F17/16 , G06F9/3001
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20240045687A1
公开(公告)日:2024-02-08
申请号:US17958365
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC: G06F9/30
CPC classification number: G06F9/3016 , G06F9/3013
Abstract: Techniques for FP8 classification or manipulation using single instructions are described. An exemplary instruction includes fields for an opcode, an identification of a location of a packed data source operand, an indication of one or more classification checks to perform, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a classification according to the indicated one or more classification checks and store a result of the classification in a corresponding data element position of the destination operand.
-
公开(公告)号:US20240045682A1
公开(公告)日:2024-02-08
申请号:US17958370
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/3001
Abstract: Techniques for scale and reduction of FP8 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a FP8 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a FP8 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20230418602A1
公开(公告)日:2023-12-28
申请号:US18456699
申请日:2023-08-28
Applicant: INTEL CORPORATION
Inventor: Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein
CPC classification number: G06F9/30014 , G06F7/5443 , G06F9/3818 , G06F9/30036 , G06F9/30105 , G06F9/30018
Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
-
公开(公告)号:US20230376313A1
公开(公告)日:2023-11-23
申请号:US17747919
申请日:2022-05-18
Applicant: Intel Corporation
Inventor: Menachem Adelman , Amit Gradstein , Cristina Anderson , Marius Cornea-Hasegan
CPC classification number: G06F9/30145 , G06F9/30021 , G06F7/22 , G06F7/483
Abstract: Techniques for instructions for min-max operations are described. An example apparatus comprises decoder circuitry to decode a single instruction, the single instruction to include fields for identifiers of a first source operand, a second source operand, an a destination operand, a field for an immediate operand, and a field for an opcode, the opcode to indicate execution circuitry is to perform a min-max operation, and execution circuitry to execute the decoded instruction according to the opcode to perform the min-max operation to determine a particular operation of five or more minimum and maximum operations in accordance with a value of the immediate operand, perform the determined particular operation on the identified first source operand and the identified second source operand to return a result, and store the result into the identified destination operand. Other examples are described and claimed.
-
10.
公开(公告)号:US20230315450A1
公开(公告)日:2023-10-05
申请号:US18313026
申请日:2023-05-05
Applicant: Intel Corporation
Inventor: Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich
CPC classification number: G06F9/30036 , G06F7/49915 , G06F9/30196 , G06F9/3887
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
-
-
-
-
-
-
-
-