APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION
    5.
    发明申请
    APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION 有权
    在SIMD / VECTOR执行中传播有条件评估值的装置和方法

    公开(公告)号:US20140189323A1

    公开(公告)日:2014-07-03

    申请号:US13997183

    申请日:2011-12-23

    IPC分类号: G06F9/30

    摘要: An apparatus and method for propagating conditionally evaluated values. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

    摘要翻译: 用于传播有条件评估值的装置和方法。 例如,根据一个实施例的方法包括:读取输入屏蔽寄存器中包含的每个值,每个值是真值或假值,并具有与其相关联的位位置; 对于从输入掩码寄存器读取的每个真值,生成包含真值的位位置的第一结果; 对于从输入屏蔽寄存器读取的每个错误值跟随第一个真实值,将输入屏蔽寄存器的向量长度加到从输入屏蔽寄存器读取的最后一个真值的位位置,以产生第二个结果; 并将每个第一结果和第二结果存储在与从输入屏蔽寄存器读取的位位置对应的输出寄存器的位位置中。

    Apparatus and method for vectorization with speculation support
    6.
    发明授权
    Apparatus and method for vectorization with speculation support 有权
    用于推测支持的矢量化装置和方法

    公开(公告)号:US09268626B2

    公开(公告)日:2016-02-23

    申请号:US13997664

    申请日:2011-12-23

    IPC分类号: G06F11/00 G06F11/07 G06F9/30

    摘要: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.

    摘要翻译: 描述了用于检测和响应处理器中的故障状况的装置和方法。 例如,一种方法的一个实施例包括:从第一向量寄存器连续读取每个有源元件,每个有源元件指定用于集合或加载操作的地址; 检测与一个或多个所述有源元件相关联的一个或多个故障状况; 对于在除了所述第一有源元件之外的元件的检测到的故障状况之前连续读取的每个有源元件,将从与所述有源元件相关联的地址加载的数据存储在第一输出向量寄存器中; 并且对于与检测到的故障状况相关联的每个有源元件并且跟随检测到的故障状况,设置输出屏蔽寄存器中的位以指示检测到的故障状况。

    INSTRUCTION TO REDUCE ELEMENTS IN A VECTOR REGISTER WITH STRIDED ACCESS PATTERN
    7.
    发明申请
    INSTRUCTION TO REDUCE ELEMENTS IN A VECTOR REGISTER WITH STRIDED ACCESS PATTERN 有权
    指示减少具有强力访问模式的矢量寄存器中的元件

    公开(公告)号:US20140189288A1

    公开(公告)日:2014-07-03

    申请号:US13993653

    申请日:2012-12-28

    IPC分类号: G06F9/30

    摘要: A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially set one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.

    摘要翻译: 由处理器的执行电路接收并执行具有非单位步进访问模式的向量减少指令。 响应于该指令,执行电路对第一向量寄存器的数据元素执行关联缩减操作。 基于屏蔽寄存器的值和正在处理的当前元件位置,执行电路顺序地将第一向量寄存器的一个或多个数据元素设置为结果,该结果是通过应用于先前的数据元素 第一向量寄存器和第三向量寄存器的数据元素。 先前的数据元素位于远离当前元素位置的多个元素位置。

    APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION
    8.
    发明申请
    APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION 审中-公开
    选择矢量计算要素的装置和方法

    公开(公告)号:US20130332701A1

    公开(公告)日:2013-12-12

    申请号:US13996521

    申请日:2011-12-23

    IPC分类号: G06F9/30

    摘要: An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register.

    摘要翻译: 描述了用于选择要在向量计算中使用的元素的装置和方法。 例如,根据一个实施例的方法包括以下操作:使用立即值来指定是否识别输入屏蔽寄存器的第一,最后或下一个有效元素; 根据立即值识别输入屏蔽寄存器中的最后一个或最后一个有效元素; 从输入矢量寄存器读取对应于输入屏蔽寄存器中识别的第一,最后或下一个最后有效元件的值; 并将该值写入输出向量寄存器。

    Speculative non-faulting loads and gathers
    9.
    发明授权
    Speculative non-faulting loads and gathers 有权
    投机无故障负载和收集

    公开(公告)号:US09189236B2

    公开(公告)日:2015-11-17

    申请号:US13725907

    申请日:2012-12-21

    IPC分类号: G06F11/00 G06F9/30 G06F11/07

    摘要: According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.

    摘要翻译: 根据一个实施例,处理器包括指令解码器,用于解码从存储器读取多个数据元素的指令,该指令具有指定存储位置的第一操作数,指定具有一个或多个位的位掩码的第二操作数,每个位 对应于数据元素之一,以及指定存储多个数据元素的存储器地址的第三操作数。 所述处理器还包括执行单元,响应于所述指令,所述执行单元基于所述第二操作数指定的位掩码,从存储器位置推测性地读取一个或多个数据元素,所述执行单元基于由所述存储器地址 并且将一个或多个数据元素存储在由第一操作数指示的存储位置中。

    AUTOMATIC LOOP VECTORIZATION USING HARDWARE TRANSACTIONAL MEMORY
    10.
    发明申请
    AUTOMATIC LOOP VECTORIZATION USING HARDWARE TRANSACTIONAL MEMORY 有权
    使用硬件交易记忆的自动环路测向

    公开(公告)号:US20150268940A1

    公开(公告)日:2015-09-24

    申请号:US14222040

    申请日:2014-03-21

    IPC分类号: G06F9/45

    CPC分类号: G06F8/452

    摘要: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.

    摘要翻译: 用于自动循环矢量化的技术包括具有优化编译器的计算设备。 在优化传递期间,编译器识别循环并生成包括循环体的向量化实现的事务代码段,其包括能够产生异常的一个或多个向量存储器读取指令。 编译器还生成非事务性回退代码段,其包括响应于在事务代码段内生成的异常被执行的循环体的标量实现。 编译器可以检测循环是否包含依赖于可以在先前迭代中更新的条件的存储器读取,或者循环是否包含两次迭代之间的潜在数据依赖性。 当实际数据依赖性存在时,编译器可以生成实际数据依赖性和要执行的显式事务中止指令的动态检查。 描述和要求保护其他实施例。