Apparatus and method of mask permute instructions

    公开(公告)号:US09632980B2

    公开(公告)日:2017-04-25

    申请号:US13976435

    申请日:2011-12-23

    IPC分类号: G06F9/30 G06F15/80

    摘要: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

    Apparatus and method for shuffling floating point or integer values
    68.
    发明授权
    Apparatus and method for shuffling floating point or integer values 有权
    用于混洗浮点或整数值的装置和方法

    公开(公告)号:US09524168B2

    公开(公告)日:2016-12-20

    申请号:US13997244

    申请日:2011-12-23

    IPC分类号: G06F9/38 G06F9/30

    摘要: An apparatus and method are described for shuffling data elements from source registers to a destination register. For example, a method according to one embodiment includes the following operations: reading each mask bit stored in a mask data structure, the mask data structure containing mask bits associated with data elements of a destination register, the values usable for determining whether a masking operation or a shuffle operation should be performed on data elements stored within a first source register and a second source register; for each data element of the destination register, if a mask bit associated with the data element indicates that a shuffle operation should be performed, then shuffling data elements from the first source register and the second source register to the specified data element within the destination register; and if the mask bit indicates that a masking operation should be performed, then performing a specified masking operation with respect to the data element of the destination register.

    摘要翻译: 描述了将数据元素从源寄存器混合到目的地寄存器的装置和方法。 例如,根据一个实施例的方法包括以下操作:读取存储在掩模数据结构中的每个掩码位,所述掩码数据结构包含与目的地寄存器的数据元素相关联的掩码位,可用于确定掩蔽操作 或者应当对存储在第一源寄存器和第二源寄存器中的数据元素执行混洗操作; 对于目标寄存器的每个数据元素,如果与数据元素相关联的掩码位指示应当执行混洗操作,则将数据元素从第一源寄存器和第二源寄存器混洗到目标寄存器中的指定数据元素 ; 并且如果掩码位指示应当执行掩蔽操作,则对目的地寄存器的数据元素执行指定的掩蔽操作。

    APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS
    69.
    发明申请
    APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS 审中-公开
    低速延迟加速器的装置和方法

    公开(公告)号:US20160246597A1

    公开(公告)日:2016-08-25

    申请号:US15145748

    申请日:2016-05-03

    IPC分类号: G06F9/30

    摘要: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.

    摘要翻译: 描述了一种用于提供加速器的低延迟调用的装置和方法。 例如,根据一个实施例的处理器包括:命令寄存器,用于存储标识要执行的命令的命令数据; 用于存储命令结果的结果寄存器或指示不能执行推荐的原因的数据; 执行逻辑以执行包括用于调用一个或多个加速器命令的加速器调用指令的多个指令,所述加速器调用指令将指定所述命令的命令数据存储在所述命令寄存器内; 一个或多个加速器,用于从命令寄存器读取命令数据并响应于尝试执行由命令数据识别的命令,其中如果一个或多个加速器成功地执行命令,则一个或多个加速器将存储包括 结果寄存器中的命令结果; 并且如果一个或多个加速器不能成功地执行命令,则一个或多个加速器将存储指示不能执行该命令的原因的结果数据,其中执行逻辑将暂停执行,直到加速器完成执行或被中断 其中所述加速器包括用于存储其状态的逻辑,如果被中断,使得其可以在稍后的时间继续执行。