METHOD AND APPARATUS FOR VECTORIZING HISTOGRAM LOOPS

    公开(公告)号:WO2019005166A1

    公开(公告)日:2019-01-03

    申请号:PCT/US2017/040509

    申请日:2017-06-30

    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.

    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS
    2.
    发明申请
    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS 审中-公开
    收缩的多针鞋的展开

    公开(公告)号:WO2014105208A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/048794

    申请日:2013-06-29

    Abstract: In an embodiment a method of vectorizing a collapsed multi-nested loop includes executing, in a vector unit of a processor, the collapsed loop to obtain a vector of offsets, including for each of a plurality of iterations, calculating a scalar offset into a multi-dimensional data structure, storing the scalar offset in a data element of a first vector register, and updating a loop counter value of a multi-dimensional loop counter vector. In turn, a plurality of data elements are loaded from the multi-dimensional data structure using a base value and indexes from the vector of offsets, at least one computation is performed on the loaded plurality of data elements to obtain a plurality of results, and the plurality of results are stored into the multi-dimensional data structure using the base value and the indexes from the vector of offsets. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,向量化折叠多嵌套循环的方法包括在处理器的向量单元中执行折叠循环以获得偏移向量,包括对于多个迭代中的每一个,将标量偏移计算为多 将标量偏移存储在第一向量寄存器的数据元素中,以及更新多维循环计数器向量的循环计数器值。 接着,使用基本值从多维数据结构中加载多个数据元素,并从偏移矢量进行索引,对被加载的多个数据元素进行至少一次计算以获得多个结果,以及 使用基本值和来自偏移矢量的索引将多个结果存储到多维数据结构中。 描述和要求保护其他实施例。

    READ AND WRITE MASKS UPDATE INSTRUCTION FOR VECTORIZATION OF RECURSIVE COMPUTATIONS OVER INDEPENDENT DATA
    3.
    发明申请
    READ AND WRITE MASKS UPDATE INSTRUCTION FOR VECTORIZATION OF RECURSIVE COMPUTATIONS OVER INDEPENDENT DATA 审中-公开
    读取和写入掩码更新指令,用于独立计算的重新计算

    公开(公告)号:WO2014051737A1

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/045505

    申请日:2013-06-12

    CPC classification number: G06F9/30036 G06F9/30018 G06F9/30032 G06F9/3013

    Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.

    Abstract translation: 处理器执行掩码更新指令以对第一屏蔽寄存器和第二掩码寄存器执行更新。 处理器内的寄存器文件包括第一掩码寄存器和第二掩码寄存器。 处理器包括执行掩膜更新指令的执行电路。 响应于掩码更新指令,执行电路将反转第一掩码寄存器中给定数量的掩码位,并且还反转第二掩码寄存器中给定数量的掩码位。

    VECTOR MOVE INSTRUCTION CONTROLLED BY READ AND WRITE MASKS
    4.
    发明申请
    VECTOR MOVE INSTRUCTION CONTROLLED BY READ AND WRITE MASKS 审中-公开
    由读取和写入掩码控制的矢量移动指令

    公开(公告)号:WO2014051733A2

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/045429

    申请日:2013-06-12

    CPC classification number: G06F15/8084 G06F9/3885

    Abstract: A processor executes a vector move instruction to move data elements from a second vector register to a first vector register under the control of a first mask register and a second mask register. A register file within the processor includes the first vector register, the second vector register, the first mask register and the second mask register. In response to the vector move instruction, execution circuitry in the processor is to replace a given number of target data elements in the first vector register with the given number of source data elements in the second vector register. Each source data element corresponds to a mask bit in the second mask register having a second bit value, and wherein each target data element corresponds to a mask bit in the first mask register having a first bit value.

    Abstract translation: 处理器执行向量移动指令,以在第一屏蔽寄存器和第二屏蔽寄存器的控制下将数据元素从第二向量寄存器移动到第一向量寄存器。 处理器内的寄存器文件包括第一向量寄存器,第二向量寄存器,第一掩码寄存器和第二掩码寄存器。 响应于向量移动指令,处理器中的执行电路是用第二向量寄存器中的给定数量的源数据元素替换第一向量寄存器中给定数量的目标数据元素。 每个源数据元素对应于具有第二位值的第二掩码寄存器中的掩码位,并且其中每个目标数据元素对应于具有第一位值的第一掩码寄存器中的掩码位。

    LOOP VECTORIZATION METHODS AND APPARATUS
    5.
    发明申请
    LOOP VECTORIZATION METHODS AND APPARATUS 审中-公开
    LOOP VECTORIZATION方法和装置

    公开(公告)号:WO2014051459A1

    公开(公告)日:2014-04-03

    申请号:PCT/RU2012/000794

    申请日:2012-09-28

    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.

    Abstract translation: 公开了环向量化方法和装置。 一个示例性方法包括:通过评估循环的条件来生成循环的一组迭代的第一控制掩码,其中产生所述第一控制掩码包括当所述条件指示操作时将所述控制掩码的位设置为第一值 并且当条件指示要循环的操作被绕过时,将第一控制掩码的位设置为第二值。 示例性方法还包括根据第一控制掩码压缩对应于循环的第一组迭代的索引。

    INSTRUCTION FOR ELEMENT OFFSET CALCULATION IN A MULTI-DIMENSIONAL ARRAY
    6.
    发明申请
    INSTRUCTION FOR ELEMENT OFFSET CALCULATION IN A MULTI-DIMENSIONAL ARRAY 审中-公开
    元素偏差计算在多维阵列中的指导

    公开(公告)号:WO2013095601A1

    公开(公告)日:2013-06-27

    申请号:PCT/US2011/067078

    申请日:2011-12-23

    Abstract: An apparatus is described having functional unit logic circuitry. The functional unit logic circuitry has a first register to store a first input vector operand having an element for each dimension of a multi-dimensional data structure. Each element of the first vector operand specifying the size of its respective dimension. The functional unit has a second register to store a second input vector operand specifying coordinates of a particular segment of the multi-dimensional structure. The functional unit also has logic circuitry to calculate an address offset for the particular segment relative to an address of an origin segment of the multi-dimensional structure.

    Abstract translation: 描述了具有功能单元逻辑电路的装置。 功能单元逻辑电路具有第一寄存器以存储具有用于多维数据结构的每个维度的元素的第一输入向量操作数。 第一个向量操作数的每个元素指定其相应维度的大小。 功能单元具有第二寄存器,用于存储指定多维结构的特定段的坐标的第二输入向量操作数。 功能单元还具有逻辑电路,用于相对于多维结构的原点片段的地址计算特定片段的地址偏移。

    METHOD AND APPARATUS FOR VECTORIZING INDIRECT UPDATE LOOPS

    公开(公告)号:WO2019005165A1

    公开(公告)日:2019-01-03

    申请号:PCT/US2017/040508

    申请日:2017-06-30

    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, and a source vector identifier, wherein the execution circuit is to, for each data element position of a source vector identified by the source vector identifier, determine a nearest matching data element position in the source vector storing a same data value as stored at the data element position, the nearest matching data element position located between the data element position and a least significant data element position of the source vector, and store, in a corresponding data element position of a destination vector identified by the destination vector identifier, a value identifying the determined nearest data element position.

    INSTRUCTION FOR SHIFTING BITS LEFT WITH PULLING ONES INTO LESS SIGNIFICANT BITS
    8.
    发明申请
    INSTRUCTION FOR SHIFTING BITS LEFT WITH PULLING ONES INTO LESS SIGNIFICANT BITS 审中-公开
    用于将位移的位置指示,将其移动到较小的重要位置

    公开(公告)号:WO2014051782A1

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/047669

    申请日:2013-06-25

    Abstract: A mask generating instruction is executed by a processor to improve efficiency of vector operations on an array of data elements. The processor includes vector registers, one of which stores data elements of an array. The processor further includes execution circuitry to receive a mask generating instruction that specifies at least a first operand and a second operand. Responsive to the mask generating instruction, the execution circuitry is to shift bits of the first operand to the left by a number of times defined in the second operand, and pull in a bit of one from the right each time a most significant bit of the first operand is shifted out from the left to generate a result. Each bit in the result corresponds to one of the data elements of the array.

    Abstract translation: 掩模生成指令由处理器执行以提高数据元素阵列上的向量操作的效率。 处理器包括向量寄存器,其中一个存储阵列的数据元素。 处理器还包括执行电路,用于接收指定至少第一操作数和第二操作数的掩码生成指令。 响应于掩模产生指令,执行电路是将第一操作数的位向左移动在第二操作数中定义的次数,并且每次将最高有效位 第一个操作数从左边移出来产生一个结果。 结果中的每个位对应于数组的数据元素之一。

Patent Agency Ranking