Apparatus and method for performing permute operations
    33.
    发明授权
    Apparatus and method for performing permute operations 有权
    用于执行置换操作的装置和方法

    公开(公告)号:US09513918B2

    公开(公告)日:2016-12-06

    申请号:US13995974

    申请日:2011-12-22

    IPC分类号: G06F9/30 G06F9/38

    摘要: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.

    摘要翻译: 描述了用掩模来置换数据元素的装置和方法。 例如,根据一个实施例的方法包括以下操作:从掩模数据结构读取值以确定是否对目的地操作数的每个数据元素实施掩蔽; 如果对于特定数据元素没有被实现掩蔽,则基于存储在目的地操作数中的索引值从第一源操作数和第二源操作数中选择数据元素以被复制到目的地操作数中的数据元素位置,其中数据中的任何一个 可以将来自第一源操作数和第二源操作数的元素复制到目的地操作数中的任何一个数据元素位置; 并且如果针对目的地操作数的特定数据元素实现掩蔽,则对该特定数据元素执行指定的掩蔽操作。

    APPARATUS AND METHOD FOR PERFORMING PERMUTE OPERATIONS
    34.
    发明申请
    APPARATUS AND METHOD FOR PERFORMING PERMUTE OPERATIONS 有权
    用于执行操作的装置和方法

    公开(公告)号:US20150026439A1

    公开(公告)日:2015-01-22

    申请号:US13995974

    申请日:2011-12-22

    IPC分类号: G06F9/30

    摘要: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.

    摘要翻译: 描述了用掩模来置换数据元素的装置和方法。 例如,根据一个实施例的方法包括以下操作:从掩模数据结构读取值以确定是否对目的地操作数的每个数据元素实施掩蔽; 如果对于特定数据元素没有被实现掩蔽,则基于存储在目的地操作数中的索引值从第一源操作数和第二源操作数中选择数据元素以被复制到目的地操作数中的数据元素位置,其中数据中的任何一个 可以将来自第一源操作数和第二源操作数的元素复制到目的地操作数中的任何一个数据元素位置; 并且如果针对目的地操作数的特定数据元素实现掩蔽,则对该特定数据元素执行指定的掩蔽操作。

    Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits
    38.
    发明授权
    Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits 有权
    在多个通道上操作的矢量洗牌指令,每个通道具有使用公共的每通道控制位的多个数据元素

    公开(公告)号:US08078836B2

    公开(公告)日:2011-12-13

    申请号:US11967211

    申请日:2007-12-30

    IPC分类号: G06F15/16

    摘要: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

    摘要翻译: 描述车道内向量随机操作。 在一个实施例中,混洗指令指定每通道控制位,源操作数和目的地操作数的字段,这些操作数具有相应的通道,每个通道被划分为多个数据元素的相应部分。 根据每通道控制位,从源操作数的每个通道的相应部分中选择数据元素的集合。 这些集合的元素被复制到目标操作数的每个通道的相应部分中的指定字段。 混洗指令的另一实施例还指定第二源操作数,所有操作数具有被划分为多个数据元素的相应通道。 根据每通道控制位选择的集合包含来自第一源操作数的每个通道部分的数据元素和来自第二源操作数的每个对应通道部分的数据元素。 将元素复制到目标操作数的每个通道中的指定字段。

    Apparatus and method for two micro-operation flow using source override
    40.
    发明授权
    Apparatus and method for two micro-operation flow using source override 失效
    使用源超控的两个微操作流的装置和方法

    公开(公告)号:US07451294B2

    公开(公告)日:2008-11-11

    申请号:US10631629

    申请日:2003-07-30

    IPC分类号: G06F9/30

    摘要: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.

    摘要翻译: 一种使用源超控的两个微操作流的方法和装置。 在一个实施例中,该方法包括识别具有一个或多个流单个指令多个数据扩展类型操作数的宏指令。 一旦接收到,宏指令被解码成第一微操作(uop)和第二uop。 一旦被解码,如果第一微操作更新与微操作的逻辑源寄存器匹配的逻辑目标寄存器,则信号被断言以禁用源操作数覆盖逻辑。 否则,当在相同的时钟周期中检测到具有匹配逻辑源和目标寄存器的UOP时,互源替代是激活的并由寄存器别名表(RAT)执行。 这样,具有128位操作数的宏指令可以在64位实现中使用例如两个uOP(一个用于下半部分,一个用于上半部分)来处理,同时保持原始指令的原子性 。