-
公开(公告)号:US11972230B2
公开(公告)日:2024-04-30
申请号:US16914318
申请日:2020-06-27
申请人: Intel Corporation
发明人: Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham
CPC分类号: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16
摘要: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
-
公开(公告)号:US20240045683A1
公开(公告)日:2024-02-08
申请号:US17958371
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/3001
摘要: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20240045677A1
公开(公告)日:2024-02-08
申请号:US17958378
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC分类号: G06F9/30
CPC分类号: G06F9/30025 , G06F9/3016
摘要: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
公开(公告)号:US20240045654A1
公开(公告)日:2024-02-08
申请号:US17958373
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC分类号: G06F7/483
CPC分类号: G06F7/483
摘要: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.
-
公开(公告)号:US11893389B2
公开(公告)日:2024-02-06
申请号:US18190761
申请日:2023-03-27
申请人: Intel Corporation
发明人: Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich
CPC分类号: G06F9/30036 , G06F9/3001 , G06F9/3016 , G06F9/3802
摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
-
公开(公告)号:US20230409326A1
公开(公告)日:2023-12-21
申请号:US17841558
申请日:2022-06-15
申请人: Intel Corporation
发明人: Menachem Adelman , Amit Gradstein , Simon Rubanovich , Barukh Ziv , Uri Sherman , Dana Rip , Shahar Mizrahi , Dan Baum , Rinat Rappoport , Nilesh Jain , Zeev Sperber , Gideon Stupp , Alexander Heinecke , Christopher Hughes , Evangelos Georganas
CPC分类号: G06F9/30145 , G06F9/30178 , G06F9/30047 , G06F9/3887 , G06N3/04
摘要: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.
-
公开(公告)号:US11816483B2
公开(公告)日:2023-11-14
申请号:US15859268
申请日:2017-12-29
申请人: Intel Corporation
发明人: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
CPC分类号: G06F9/30036 , G06F9/30101 , G06F17/16
摘要: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
-
公开(公告)号:US11809869B2
公开(公告)日:2023-11-07
申请号:US15858937
申请日:2017-12-29
申请人: Intel Corporation
发明人: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/30043
摘要: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US11789729B2
公开(公告)日:2023-10-17
申请号:US15858916
申请日:2017-12-29
申请人: Intel Corporation
发明人: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall
CPC分类号: G06F9/3001 , G06F9/3005 , G06F9/3016 , G06F9/30036 , G06F9/30043 , G06F9/30076 , G06F9/30109 , G06F9/30123 , G06F9/30145 , G06F9/383 , G06F9/3824
摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
-
公开(公告)号:US11709678B2
公开(公告)日:2023-07-25
申请号:US17335284
申请日:2021-06-01
申请人: Intel Corporation
发明人: Zeev Sperber , Tomer Weiner , Amit Gradstein , Simon Rubanovich , Alex Gerber , Itai Ravid
CPC分类号: G06F9/3016 , G06F9/3001 , G06F9/30094 , G06F9/30145 , G06F9/30167 , G06F9/384 , G06F9/3861
摘要: In one embodiment, a processor includes fetch logic to fetch instructions, decode logic to decode the fetched instructions, and execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer.
-
-
-
-
-
-
-
-
-