METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY
    2.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY 有权
    方法,设备,说明和逻辑提供带有领先零点功能的PTE控制

    公开(公告)号:US20140189309A1

    公开(公告)日:2014-07-03

    申请号:US13731008

    申请日:2012-12-29

    IPC分类号: G06F9/30

    摘要: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

    摘要翻译: 说明和逻辑提供带有零计数功能的SIMD置换控制。 一些实施例包括具有多个数据字段的寄存器的处理器,每个数据字段用于存储第二多个位。 目的地寄存器具有对应的数据字段,这些数据字段中的每一个用于存储对于相应数据字段设置为零的最重要连续位数的计数。 响应于对向量前导零计数指令进行解码,执行单元对寄存器中的每个数据字段计数设置为零的最高有效连续位的数目,并将计数存储在第一目的地寄存器的相应数据字段中。 向量前导零计数指令可用于生成与该组置换控制一起使用的置换控制和完成掩码,以解决采集修改散射SIMD操作中的依赖关系。

    LOOP VECTORIZATION METHODS AND APPARATUS
    4.
    发明申请
    LOOP VECTORIZATION METHODS AND APPARATUS 有权
    LOOP VECTORIZATION方法和装置

    公开(公告)号:US20140095850A1

    公开(公告)日:2014-04-03

    申请号:US13994549

    申请日:2012-09-28

    IPC分类号: G06F9/38

    摘要: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.

    摘要翻译: 公开了环向量化方法和装置。 一个示例性方法包括:通过评估循环的条件来生成循环的一组迭代的第一控制掩码,其中产生所述第一控制掩码包括当所述条件指示操作时将所述控制掩码的位设置为第一值 并且当条件指示要循环的操作被绕过时,将第一控制掩码的位设置为第二值。 示例性方法还包括根据第一控制掩码压缩对应于循环的第一组迭代的索引。

    UNIQUE PACKED DATA ELEMENT IDENTIFICATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
    7.
    发明申请
    UNIQUE PACKED DATA ELEMENT IDENTIFICATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开
    独特的包装数据元素识别处理器,方法,系统和说明

    公开(公告)号:US20140351567A1

    公开(公告)日:2014-11-27

    申请号:US13977686

    申请日:2011-12-30

    IPC分类号: G06F9/30

    摘要: A method of an aspect includes receiving a unique packed data element identification instruction. The unique packed data element identification instruction indicates a source packed data having a plurality of packed data elements and indicates a destination storage location. A unique packed data element identification result is stored in the destination storage location in response to the unique packed data element identification instruction. The unique packed data element identification result indicates which of the plurality of the packed data elements are unique in the source packed data. Other methods, apparatus, systems, and instructions are disclosed.

    摘要翻译: 一种方面的方法包括接收唯一的打包数据元素识别指令。 独特的打包数据元素识别指令指示具有多个打包数据元素的源打包数据,并且指示目的地存储位置。 响应于唯一的打包数据元素识别指令,唯一的打包数据元素识别结果被存储在目的地存储位置中。 独特的打包数据元素识别结果指示多个打包数据元素中的哪一个在源打包数据中是唯一的。 公开了其它方法,装置,系统和指令。

    Vectorization Of Collapsed Multi-Nested Loops
    8.
    发明申请
    Vectorization Of Collapsed Multi-Nested Loops 审中-公开
    折叠多嵌套循环的向量化

    公开(公告)号:US20140188961A1

    公开(公告)日:2014-07-03

    申请号:US13728439

    申请日:2012-12-27

    IPC分类号: G06F17/11

    摘要: In an embodiment a method of vectorizing a collapsed multi-nested loop includes executing, in a vector unit of a processor, the collapsed loop to obtain a vector of offsets, including for each of a plurality of iterations, calculating a scalar offset into a multi-dimensional data structure, storing the scalar offset in a data element of a first vector register, and updating a loop counter value of a multi-dimensional loop counter vector. In turn, a plurality of data elements are loaded from the multi-dimensional data structure using a base value and indexes from the vector of offsets, at least one computation is performed on the loaded plurality of data elements to obtain a plurality of results, and the plurality of results are stored into the multi-dimensional data structure using the base value and the indexes from the vector of offsets. Other embodiments are described and claimed.

    摘要翻译: 在一个实施例中,向量化折叠多嵌套循环的方法包括在处理器的向量单元中执行折叠循环以获得偏移向量,包括对于多个迭代中的每一个,将标量偏移计算为多 将标量偏移存储在第一向量寄存器的数据元素中,以及更新多维循环计数器向量的循环计数器值。 接着,使用基本值从多维数据结构中加载多个数据元素,并从偏移矢量进行索引,对被加载的多个数据元素进行至少一次计算以获得多个结果,以及 使用基本值和来自偏移矢量的索引将多个结果存储到多维数据结构中。 描述和要求保护其他实施例。

    INSTRUCTION FOR SHIFTING BITS LEFT WITH PULLING ONES INTO LESS SIGNIFICANT BITS
    9.
    发明申请
    INSTRUCTION FOR SHIFTING BITS LEFT WITH PULLING ONES INTO LESS SIGNIFICANT BITS 有权
    用于将位移的位置指示,将其移动到较小的重要位置

    公开(公告)号:US20140095830A1

    公开(公告)日:2014-04-03

    申请号:US13630131

    申请日:2012-09-28

    IPC分类号: G06F9/315

    摘要: A mask generating instruction is executed by a processor to improve efficiency of vector operations on an array of data elements. The processor includes vector registers, one of which stores data elements of an array. The processor further includes execution circuitry to receive a mask generating instruction that specifies at least a first operand and a second operand. Responsive to the mask generating instruction, the execution circuitry is to shift bits of the first operand to the left by a number of times defined in the second operand, and pull in a bit of one from the right each time a most significant bit of the first operand is shifted out from the left to generate a result. Each bit in the result corresponds to one of the data elements of the array.

    摘要翻译: 掩模生成指令由处理器执行以提高数据元素阵列上的向量操作的效率。 处理器包括向量寄存器,其中一个存储阵列的数据元素。 处理器还包括执行电路,用于接收指定至少第一操作数和第二操作数的掩码生成指令。 响应于掩模生成指令,执行电路是将第一操作数的位向左移动在第二操作数中定义的次数,并且每次将最高有效位 第一个操作数从左边移出来产生一个结果。 结果中的每个位对应于数组的数据元素之一。