INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    2.
    发明公开

    公开(公告)号:US20240045684A1

    公开(公告)日:2024-02-08

    申请号:US17958380

    申请日:2022-10-01

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.

    Apparatus and method for vector multiply and accumulate of packed bytes

    公开(公告)号:US11768681B2

    公开(公告)日:2023-09-26

    申请号:US15879419

    申请日:2018-01-24

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; a third source register to store a plurality of packed doublewords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed bytes to generate a first and second plurality of words corresponding to the first and second plurality of packed bytes; multiplier circuitry to multiply each of the first plurality of words with a corresponding one of the second plurality of words to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed doubleword value from a first doubleword location in the third source register to generate a first accumulated doubleword result; a destination register to store the first accumulated doubleword result in the first doubleword location.

    Computer processor for higher precision computations using a mixed-precision decomposition of operations

    公开(公告)号:US11544057B2

    公开(公告)日:2023-01-03

    申请号:US17069230

    申请日:2020-10-13

    申请人: INTEL CORPORATION

    摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20200026745A1

    公开(公告)日:2020-01-23

    申请号:US16586114

    申请日:2019-09-27

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F9/50 G06F9/30

    摘要: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable from a first mode where a respective output of each of a first proper subset of fused multiply accumulate circuits of the two-dimensional grid is transmitted downstream to a respective input of each of a second proper subset of fused multiply accumulate circuits of the two-dimensional grid to form output values from at least one first input two-dimensional matrix and at least one second input two-dimensional matrix, and store the output values in resultant storage, to a second mode where the respective output of each of the first proper subset of fused multiply accumulate circuits of the two-dimensional grid form first output values from a first subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the first output values in the resultant storage, and a respective output of each of the second proper subset of fused multiply accumulate circuits of the two-dimensional grid form second output values from a second subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the second output values in the resultant storage.