A COMPUTER PROCESSOR FOR HIGHER PRECISION COMPUTATIONS USING A MIXED-PRECISION DECOMPOSITION OF OPERATIONS

    公开(公告)号:EP3812883A1

    公开(公告)日:2021-04-28

    申请号:EP20215256.7

    申请日:2019-06-25

    Abstract: Embodiments detailed herein relate to instructions to perform a matrix multiplication. An exemplary processor comprises a cache to store data, and a plurality of cores coupled to the cache. At least one core of the plurality of cores comprises execution circuitry to execute one or more instructions to perform a matrix multiplication with a first source matrix and a second source matrix to generate a result matrix. The execution circuitry is to convert a first plurality of data elements of the first source matrix and a second plurality of data elements of the second source matrix from a single-precision floating point data format to a reduced precision floating point format having fewer mantissa bits than the single-precision floating point format and a same number of exponent bits as the single-precision floating point format; and perform a plurality of parallel fused multiply-add operations to multiply the first plurality of data elements in the reduced precision floating point format by corresponding data elements of the second plurality of data elements in the reduced precision floating point format to generate a plurality of products, and to add the plurality of products to accumulated values to generate single-precision floating point data elements of the result matrix.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:EP3798823A1

    公开(公告)日:2021-03-31

    申请号:EP20178989.8

    申请日:2020-06-09

    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable from a first mode where a respective output of each of a first proper subset of fused multiply accumulate circuits of the two-dimensional grid is transmitted downstream to a respective input of each of a second proper subset of fused multiply accumulate circuits of the two-dimensional grid to form output values from at least one first input two-dimensional matrix and at least one second input two-dimensional matrix, and store the output values in resultant storage, to a second mode where the respective output of each of the first proper subset of fused multiply accumulate circuits of the two-dimensional grid form first output values from a first subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the first output values in the resultant storage, and a respective output of each of the second proper subset of fused multiply accumulate circuits of the two-dimensional grid form second output values from a second subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the second output values in the resultant storage.

    APPARATUS AND METHOD FOR DOWN-CONVERTING AND INTERLEAVING MULTIPLE FLOATING POINT VALUES

    公开(公告)号:EP3716048A1

    公开(公告)日:2020-09-30

    申请号:EP20155995.2

    申请日:2020-02-07

    Abstract: An apparatus and method down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.

    APPARATUS AND METHOD FOR DOWN-CONVERTING AND INTERLEAVING MULTIPLE FLOATING POINT VALUES

    公开(公告)号:EP4321992A3

    公开(公告)日:2024-05-01

    申请号:EP23210931.4

    申请日:2020-02-07

    Abstract: An apparatus and method for down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.

    APPARATUS AND METHOD FOR DOWN-CONVERTING AND INTERLEAVING MULTIPLE FLOATING POINT VALUES

    公开(公告)号:EP4321992A2

    公开(公告)日:2024-02-14

    申请号:EP23210931.4

    申请日:2020-02-07

    Abstract: An apparatus and method for down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    10.
    发明公开

    公开(公告)号:EP4318224A1

    公开(公告)日:2024-02-07

    申请号:EP23182966.4

    申请日:2023-07-03

    Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

Patent Agency Ranking