APPARATUS AND METHOD FOR VECTOR MULTIPLY AND ACCUMULATE OF SIGNED DOUBLEWORDS

    公开(公告)号:US20190196827A1

    公开(公告)日:2019-06-27

    申请号:US15850180

    申请日:2017-12-21

    Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed signed doubleword data elements; a second source register to store a second plurality of packed signed doubleword data elements; a third source register to store a plurality of packed signed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed signed doubleword data elements from the first source register with third and fourth packed signed doubleword data elements from the second source register, respectively, to generate first and second temporary signed quadword products, the multiplier circuitry to select the first, second, third, and fourth signed doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary signed quadword product with a first packed signed quadword value read from the third source register to generate a first accumulated signed quadword result and to combine the second temporary signed quadword product with a second packed signed quadword value read from the third source register to generate a second accumulated signed quadword result; a destination register or the third source register to store the first accumulated signed quadword result in a first signed quadword data element position and to store the second accumulated signed quadword result in a second signed quadword data element position.

    FIXED POINT TO FLOATING POINT CONVERSION
    32.
    发明申请

    公开(公告)号:US20190196818A1

    公开(公告)日:2019-06-27

    申请号:US16291245

    申请日:2019-03-04

    CPC classification number: G06F9/30025 G06F7/483 H03M7/24

    Abstract: Embodiments of instructions and methods of execution of said instructions and resources to execute said instructions are detailed. For example, in an embodiment, a processor comprising: decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a data element from a least significant packed data element position of the identified packed data source operand from a fixed-point representation to a floating point representation, store the floating point representation into a 32-bit least significant packed data element position of the identified packed data destination operand, and zero all remaining packed data elements of the identified packed data destination operand is described.

    SYSTEMS, APPARATUSES, AND METHODS FOR MULTIPLICATION AND ACCUMULATION OF VECTOR PACKED SIGNED VALUES

    公开(公告)号:US20190102198A1

    公开(公告)日:2019-04-04

    申请号:US15721616

    申请日:2017-09-29

    Abstract: Embodiments of systems, apparatuses, and methods for multiplication and accumulation of signed data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected signed data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first signed result values, sum the plurality of first signed result values to generate one or more second signed result values, accumulate the one or more signed result values with one or more data values from a destination operand to generate one or more third signed result values, and store the one or more third signed result values in one or more packed data element positions in the destination operand.

    Dual sum of quadword 16×16 multiply and accumulate

    公开(公告)号:US12204903B2

    公开(公告)日:2025-01-21

    申请号:US17359522

    申请日:2021-06-26

    Abstract: Techniques for matrix multiplication are described. In some examples, a single instruction having a format of fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand is used. Wherein the opcode is to indicate that execution circuitry is to: multiply values from corresponding data elements of the first and second sources, add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    38.
    发明公开

    公开(公告)号:US20240045684A1

    公开(公告)日:2024-02-08

    申请号:US17958380

    申请日:2022-10-01

    CPC classification number: G06F9/30145 G06F9/30036 G06F9/30018

    Abstract: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.

    Apparatus and method for vector multiply and accumulate of packed bytes

    公开(公告)号:US11768681B2

    公开(公告)日:2023-09-26

    申请号:US15879419

    申请日:2018-01-24

    Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; a third source register to store a plurality of packed doublewords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed bytes to generate a first and second plurality of words corresponding to the first and second plurality of packed bytes; multiplier circuitry to multiply each of the first plurality of words with a corresponding one of the second plurality of words to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed doubleword value from a first doubleword location in the third source register to generate a first accumulated doubleword result; a destination register to store the first accumulated doubleword result in the first doubleword location.

Patent Agency Ranking