Elimination of potential renaming stalls due to use of partial registers
    1.
    发明授权
    Elimination of potential renaming stalls due to use of partial registers 有权
    消除由于使用部分寄存器引起的潜在重命名失速

    公开(公告)号:US07162614B2

    公开(公告)日:2007-01-09

    申请号:US10608121

    申请日:2003-06-30

    IPC分类号: G06F9/38

    摘要: Two or more pointers, each of which indicates where values of a respective group of bits of a source of a particular micro-operation will be found when the particular micro-operation is executed, may not all point to the same register. Renaming of the source of the particular micro-operation may be enabled by generating one or more new micro-operations that merge the values into a single register. The one or more new micro-operations are inserted into a sequence of micro-operations that includes the particular micro-operation. Once the source of the particular micro-operation has been renamed, subsequent micro-operations in the sequence may be renamed, if appropriate, and executed, without having to wait for the values to be calculated.

    摘要翻译: 两个或更多个指针,每个指针指示当执行特定微操作时将发现特定微操作的源的相应组的位置的值,其可能不都指向相同的寄存器。 可以通过生成将值合并到单个寄存器中的一个或多个新的微操作来实现特定微操作的源的重命名。 一个或多个新的微操作被插入到包括特定微操作的微操作的序列中。 一旦特定微操作的源被重新命名,则可以重新命名该序列中的后续微操作,如果适用并被执行,而不必等待该值被计算。

    Apparatus and method for two micro-operation flow using source override
    2.
    发明授权
    Apparatus and method for two micro-operation flow using source override 失效
    使用源超控的两个微操作流的装置和方法

    公开(公告)号:US07451294B2

    公开(公告)日:2008-11-11

    申请号:US10631629

    申请日:2003-07-30

    IPC分类号: G06F9/30

    摘要: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.

    摘要翻译: 一种使用源超控的两个微操作流的方法和装置。 在一个实施例中,该方法包括识别具有一个或多个流单个指令多个数据扩展类型操作数的宏指令。 一旦接收到,宏指令被解码成第一微操作(uop)和第二uop。 一旦被解码,如果第一微操作更新与微操作的逻辑源寄存器匹配的逻辑目标寄存器,则信号被断言以禁用源操作数覆盖逻辑。 否则,当在相同的时钟周期中检测到具有匹配逻辑源和目标寄存器的UOP时,互源替代是激活的并由寄存器别名表(RAT)执行。 这样,具有128位操作数的宏指令可以在64位实现中使用例如两个uOP(一个用于下半部分,一个用于上半部分)来处理,同时保持原始指令的原子性 。

    Apparatus and method for two micro-operation flow using source override
    3.
    发明申请
    Apparatus and method for two micro-operation flow using source override 失效
    使用源超控的两个微操作流的装置和方法

    公开(公告)号:US20050027967A1

    公开(公告)日:2005-02-03

    申请号:US10631629

    申请日:2003-07-30

    IPC分类号: G06F9/30 G06F9/318 G06F9/38

    摘要: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.

    摘要翻译: 一种使用源超控的两个微操作流的方法和装置。 在一个实施例中,该方法包括识别具有一个或多个流单个指令多个数据扩展类型操作数的宏指令。 一旦接收到,宏指令被解码成第一微操作(uop)和第二uop。 一旦被解码,如果第一微操作更新与微操作的逻辑源寄存器匹配的逻辑目标寄存器,则信号被断言以禁用源操作数覆盖逻辑。 否则,当在相同的时钟周期中检测到具有匹配逻辑源和目标寄存器的UOP时,互源替代是激活的并由寄存器别名表(RAT)执行。 这样,具有128位操作数的宏指令可以在64位实现中使用例如两个uOP(一个用于下半部分,一个用于上半部分)来处理,同时保持原始指令的原子性 。

    APPARATUS AND METHOD FOR PERFORMING A PERMUTE OPERATION
    5.
    发明申请
    APPARATUS AND METHOD FOR PERFORMING A PERMUTE OPERATION 有权
    用于执行操作的装置和方法

    公开(公告)号:US20150026440A1

    公开(公告)日:2015-01-22

    申请号:US13996072

    申请日:2011-12-23

    IPC分类号: G06F9/30

    摘要: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from the destination operand and a second source operand based on index values stored in a first source operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the destination operand and the second source operand may be copied to any one of the data element positions within the destination operand; if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.

    摘要翻译: 描述了用掩模来置换数据元素的装置和方法。 例如,根据一个实施例的方法包括以下操作:从掩模数据结构读取值以确定是否对目的地操作数的每个数据元素实施掩蔽; 如果对特定数据元素没有实现掩蔽,则根据存储在第一源操作数中的索引值从目的地操作数和第二源操作数中选择要复制到目的地操作数内的数据元素位置的第二源操作数,其中, 来自目的地操作数和第二源操作数的数据元素可以被复制到目的地操作数中的任何一个数据元素位置; 如果针对目的地操作数的特定数据元素实现掩蔽,则对该特定数据元素执行指定的屏蔽操作。

    Mixing instructions with different register sizes
    6.
    发明授权
    Mixing instructions with different register sizes 有权
    混合使用不同寄存器大小的指令

    公开(公告)号:US08694758B2

    公开(公告)日:2014-04-08

    申请号:US11965667

    申请日:2007-12-27

    IPC分类号: G06F9/34

    摘要: When legacy instructions, that can only operate on smaller registers, are mixed with new instructions in a processor with larger registers, special handling and architecture are used to prevent the legacy instructions from causing problems with the data in the upper portion of the registers, i.e., the portion that they cannot directly access. In some embodiments, the upper portion of the registers are saved to temporary storage while the legacy instructions are operating, and restored to the upper portion of the registers when the new instructions are operating. A special instruction may also be used to disable this save/restore operation if the new instruction are not going to use the upper part of the registers.

    摘要翻译: 当只能在较小寄存器上运行的传统指令与具有较大寄存器的处理器中的新指令混合时,使用特殊处理和架构来防止遗留指令在寄存器上部的数据引起问题,即 ,他们不能直接访问的部分。 在一些实施例中,当旧指令正在操作时,寄存器的上部保存到临时存储器中,并且当新指令正在操作时将寄存器的上部部分恢复到寄存器的上部。 如果新指令不会使用寄存器的上半部分,也可以使用特殊指令禁用此保存/恢复操作。

    APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS
    9.
    发明申请
    APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS 有权
    装置和改进插入指令的方法

    公开(公告)号:US20130283021A1

    公开(公告)日:2013-10-24

    申请号:US13976992

    申请日:2011-12-23

    IPC分类号: G06F9/30

    摘要: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

    摘要翻译: 描述了具有执行第一,第二,第三和第四指令的指令执行逻辑电路的装置。 第一指令和第二指令都将第一组输入向量元素插入到相应的第一和第二合成向量的多个第一非重叠部分之一中。 第一组具有第一位宽度。 多个第一非重叠部分中的每一个具有与第一组相同的位宽度。 第三指令和第四指令都将第二组输入矢量元素插入相应的第三和第四合成矢量的多个第二非重叠部分中的一个。 第二组具有大于所述第一位宽度的第二位宽度。 多个第二非重叠部分中的每一个具有与第二组相同的位宽度。 该装置还包括掩蔽层电路,以第一合成矢量粒度掩蔽第一和第三指令,并以第二合成向量粒度掩蔽第二和第四指令。