Methods, apparatus, instructions and logic to provide vector packed tuple cross-comparison functionality

    公开(公告)号:US10203955B2

    公开(公告)日:2019-02-12

    申请号:US14588247

    申请日:2014-12-31

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

    APPARATUS AND METHOD FOR VECTOR COMPRESSION
    113.
    发明申请

    公开(公告)号:US20180309461A1

    公开(公告)日:2018-10-25

    申请号:US15922642

    申请日:2018-03-15

    申请人: Intel Corporation

    IPC分类号: H03M7/30 G06F9/30 G06F15/80

    摘要: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.

    Apparatus and method for vector compression

    公开(公告)号:US09929745B2

    公开(公告)日:2018-03-27

    申请号:US14499038

    申请日:2014-09-26

    申请人: INTEL CORPORATION

    摘要: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.

    APPARATUS AND METHOD FOR VECTOR COMPRESSION
    118.
    发明申请
    APPARATUS AND METHOD FOR VECTOR COMPRESSION 有权
    用于矢量压缩的装置和方法

    公开(公告)号:US20160094241A1

    公开(公告)日:2016-03-31

    申请号:US14499038

    申请日:2014-09-26

    申请人: INTEL CORPORATION

    IPC分类号: H03M7/30 G06F17/16

    摘要: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.

    摘要翻译: 描述了用于执行向量压缩的装置和方法。 例如,处理器的一个实施例包括:矢量压缩逻辑,用于压缩包括多个有效数据元素和无效数据元素的源向量,以产生其中有效数据元素连续地存储在目的地向量的一侧上的目的地向量, 矢量压缩逻辑,以利用与源矢量相关联的位掩码,并且包括多个位,每个位对应于源向量的多个数据元素中的一个,并且指示数据元素是否包括有效数据元素或无效 数据元素,矢量压缩逻辑,以利用比特掩码的索引和比特掩码的相关比特值来生成控制向量; 并且根据控制向量来洗牌来将源向量的数据元素洗牌/排列到目的地向量。

    Apparatuses, methods, and systems for instructions for downconverting a tile row and interleaving with a register

    公开(公告)号:US12086595B2

    公开(公告)日:2024-09-10

    申请号:US17214853

    申请日:2021-03-27

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.