Apparatus and method of mask permute instructions

    公开(公告)号:US10467185B2

    公开(公告)日:2019-11-05

    申请号:US15495933

    申请日:2017-04-24

    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

    Packed rotate processors, methods, systems, and instructions

    公开(公告)号:US10324718B2

    公开(公告)日:2019-06-18

    申请号:US15864158

    申请日:2018-01-08

    Abstract: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.

    PROVIDING VECTOR HORIZONTAL COMPARE FUNCTIONALITY WITHIN A VECTOR REGISTER

    公开(公告)号:US20170235572A1

    公开(公告)日:2017-08-17

    申请号:US15585505

    申请日:2017-05-03

    CPC classification number: G06F9/30018 G06F9/30021 G06F9/30036

    Abstract: A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.

    Instruction and Logic for Permute with Out of Order Loading

    公开(公告)号:US20170177345A1

    公开(公告)日:2017-06-22

    申请号:US14975390

    申请日:2015-12-18

    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from a plurality of structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into a plurality of preliminary vector registers with a first indexed layout of elements and a second indexed layout of elements. A plurality of the preliminary vector registers are to be loaded with the first indexed layout of elements. A common register of the preliminary vector registers are to be loaded with the second indexed layout of elements. The core also includes logic to apply permute instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective source vector registers.

    Instructions and Logic for Blend and Permute Operation Sequences

    公开(公告)号:US20170177344A1

    公开(公告)日:2017-06-22

    申请号:US14974729

    申请日:2015-12-18

    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into preliminary vector registers. The source data is to be unaligned as resident in the vector registers. The core includes logic to apply blend instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective interim vector registers, and to apply further blend instructions to contents of the interim vector registers to cause additional indexed elements from the structures to be loaded into respective source vector registers.

    Efficient zero-based decompression
    20.
    发明授权

    公开(公告)号:US10540177B2

    公开(公告)日:2020-01-21

    申请号:US15438712

    申请日:2017-02-21

    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.

Patent Agency Ranking