-
公开(公告)号:US20240045683A1
公开(公告)日:2024-02-08
申请号:US17958371
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/3001
摘要: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20240045677A1
公开(公告)日:2024-02-08
申请号:US17958378
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC分类号: G06F9/30
CPC分类号: G06F9/30025 , G06F9/3016
摘要: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
公开(公告)号:US20240045654A1
公开(公告)日:2024-02-08
申请号:US17958373
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
IPC分类号: G06F7/483
CPC分类号: G06F7/483
摘要: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.
-
公开(公告)号:US11494163B2
公开(公告)日:2022-11-08
申请号:US16562979
申请日:2019-09-06
申请人: Intel Corporation
发明人: Naveen Mellempudi , Dipankar Das , Chunhui Mei , Kristopher Wong , Dhiraj D. Kalamkar , Hong H. Jiang , Subramaniam Maiyuran , Varghese George
摘要: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.
-
公开(公告)号:US09996347B2
公开(公告)日:2018-06-12
申请号:US14582784
申请日:2014-12-24
申请人: Intel Corporation
发明人: Victor Lee , Ugonna Echeruo , George Chrysos , Naveen Mellempudi
IPC分类号: G06F9/30
CPC分类号: G06F9/30036
摘要: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
-
6.
公开(公告)号:US12056489B2
公开(公告)日:2024-08-06
申请号:US18313026
申请日:2023-05-05
申请人: Intel Corporation
发明人: Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich
CPC分类号: G06F9/30036 , G06F7/49915 , G06F9/30196 , G06F9/3887
摘要: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
7.
公开(公告)号:US20240045689A1
公开(公告)日:2024-02-08
申请号:US17958377
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
CPC分类号: G06F9/3016 , G06F7/4876 , G06F17/16 , G06F9/3802 , G06F9/3013 , G06F9/3001
摘要: Disclosed embodiments relate to systems and methods for performing 8-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply pairs of 8-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20240045684A1
公开(公告)日:2024-02-08
申请号:US17958380
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/30018
摘要: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.
-
公开(公告)号:US11893490B2
公开(公告)日:2024-02-06
申请号:US18060414
申请日:2022-11-30
申请人: Intel Corporation
IPC分类号: G06N3/08 , G06N5/04 , G06T15/00 , G06F9/46 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06T17/20 , G06T15/80 , G06T17/10 , G06T15/04 , G06V10/94
CPC分类号: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/005 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20 , G06V10/94
摘要: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US20220413803A1
公开(公告)日:2022-12-29
申请号:US17304803
申请日:2021-06-25
申请人: Intel Corporation
发明人: Jorge Parra , Fangwen Fu , Subramaniam Maiyuran , Varghese George , Mike Macpherson , Supratim Pal , Chandra Gurram , Sabareesh Ganapathy , Sasikanth Avancha , Dharma Teja Vooturi , Naveen Mellempudi , Dipankar Das
摘要: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.
-
-
-
-
-
-
-
-
-