APPARATUS AND METHOD FOR VECTOR MULTIPLY AND ACCUMULATE OF PACKED WORDS

    公开(公告)号:US20190227797A1

    公开(公告)日:2019-07-25

    申请号:US15879420

    申请日:2018-01-24

    Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed words; a second source register to store a second plurality of packed words; a third source register to store a plurality of packed quadwords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed words to generate a first and second plurality of doublewords corresponding to the first and second plurality of packed words; multiplier circuitry to multiply each of the first plurality of doublewords with a corresponding one of the second plurality of doublewords to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result; a destination register to store the first accumulated quadword result in the first quadword location.

    APPARATUS AND METHOD FOR SHIFTING PACKED QUADWORDS AND EXTRACTING PACKED WORDS

    公开(公告)号:US20190196822A1

    公开(公告)日:2019-06-27

    申请号:US15851145

    申请日:2017-12-21

    Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 16 least significant bit region.

    APPARATUS AND METHOD FOR VECTOR MULTIPLY AND ACCUMULATE OF UNSIGNED DOUBLEWORDS

    公开(公告)号:US20190196812A1

    公开(公告)日:2019-06-27

    申请号:US15850412

    申请日:2017-12-21

    Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; a third source register to store a plurality of packed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed doubleword data elements from the first source register with third and fourth packed doubleword data elements from the second source register, respectively, to generate first and second temporary quadword products, the multiplier circuitry to select the first, second, third, and fourth doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary quadword product with a first packed quadword value read from the third source register to generate a first accumulated quadword result and to combine the second temporary quadword product with a second packed quadword value read from the third source register to generate a second accumulated quadword result; a destination register or the third source register to store the first accumulated quadword result in a first quadword data element position and to store the second accumulated quadword result in a second quadword data element position.

    APPARATUS AND METHOD FOR PERFORMING TRANSFORMS OF PACKED COMPLEX DATA HAVING REAL AND IMAGINARY COMPONENTS

    公开(公告)号:US20190102195A1

    公开(公告)日:2019-04-04

    申请号:US15721471

    申请日:2017-09-29

    Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; a third source register to store a third plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first and second source registers to multiply based on an immediate of the first instruction, the multiplier circuitry to multiply first packed data elements from the first source register with second packed data elements from the second source register in accordance with the immediate to generate a plurality of real and imaginary products, adder circuitry to select real and imaginary data elements in the third source register based on the immediate, the adder circuitry to add and subtract selected real and imaginary values from the real and imaginary products to generate first real and imaginary results; scaling, rounding, and/or saturation circuitry to scale, round, and/or saturate the first real and imaginary results to generate final real and imaginary data elements; and a destination register to store the final real and imaginary data elements in specified data element positions.

    APPARATUS AND METHOD FOR COMPLEX BY COMPLEX CONJUGATE MULTIPLICATION

    公开(公告)号:US20190102193A1

    公开(公告)日:2019-04-04

    申请号:US15721448

    申请日:2017-09-29

    Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result, accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.

    APPARATUS AND METHOD FOR MULTIPLICATION AND ACCUMULATION OF COMPLEX AND REAL PACKED DATA ELEMENTS

    公开(公告)号:US20190102183A1

    公开(公告)日:2019-04-04

    申请号:US15721459

    申请日:2017-09-29

    Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products to generate a first temporary result and to add a second subset of the plurality of imaginary products to generate a second temporary result; accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.

    INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY

    公开(公告)号:US20190095643A1

    公开(公告)日:2019-03-28

    申请号:US16141283

    申请日:2018-09-25

    Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

    APPARATUSES, METHODS, AND SYSTEMS FOR MIXING VECTOR OPERATIONS

    公开(公告)号:US20180088946A1

    公开(公告)日:2018-03-29

    申请号:US15277963

    申请日:2016-09-27

    Abstract: Systems, methods, and apparatuses relating to mixing vector operations are described. In one embodiment, a processor includes a decoder to decode an instruction; and an execution unit to execute the decoded instruction to: receive a first input operand of a first data vector, a second input operand of a second data vector, and a third input operand of a control value vector, perform a first operation on data in a same element position of the first and second data vectors for each same element position of the control value vector having a first control value, perform a second, different operation on data in a same element position of the first and second data vectors for each same element position of the control value vector having a second, different control value, and output results from each first operation and each second operation into each corresponding element position in an output vector.

    APPARATUS AND METHOD OF MASK PERMUTE INSTRUCTIONS

    公开(公告)号:US20170322905A1

    公开(公告)日:2017-11-09

    申请号:US15495933

    申请日:2017-04-24

    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Patent Agency Ranking