APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS
    136.
    发明申请
    APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS 审中-公开
    低速延迟加速器的装置和方法

    公开(公告)号:US20170017491A1

    公开(公告)日:2017-01-19

    申请号:US15281944

    申请日:2016-09-30

    IPC分类号: G06F9/38 G06F12/0875 G06F9/30

    摘要: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.

    摘要翻译: 描述了一种用于提供加速器的低延迟调用的装置和方法。 例如,根据一个实施例的处理器包括:命令寄存器,用于存储标识要执行的命令的命令数据; 用于存储命令结果的结果寄存器或指示不能执行推荐的原因的数据; 执行逻辑以执行包括用于调用一个或多个加速器命令的加速器调用指令的多个指令,所述加速器调用指令将指定所述命令的命令数据存储在所述命令寄存器内; 一个或多个加速器,用于从命令寄存器读取命令数据,并且响应地尝试执行由命令数据识别的命令,其中如果一个或多个加速器成功执行命令,则一个或多个加速器将存储包括 结果寄存器中的命令结果; 并且如果一个或多个加速器不能成功地执行命令,则一个或多个加速器将存储指示不能执行该命令的原因的结果数据,其中执行逻辑将暂停执行,直到加速器完成执行或被中断 其中所述加速器包括用于存储其状态的逻辑,如果被中断,使得其可以在稍后的时间继续执行。

    Floating point scaling processors, methods, systems, and instructions
    137.
    发明授权
    Floating point scaling processors, methods, systems, and instructions 有权
    浮点缩放处理器,方法,系统和指令

    公开(公告)号:US09448765B2

    公开(公告)日:2016-09-20

    申请号:US13977086

    申请日:2011-12-28

    IPC分类号: G06F7/483 G06F9/30

    摘要: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.

    摘要翻译: 一个方面的方法包括接收浮点缩放指令。 浮点缩放指令指示包括一个或多个浮点数据元素的第一源,包括一个或多个对应浮点数据元素的第二源和目的地。 响应于浮点缩放指令,结果存储在目的地中。 结果包括一个或多个相应的结果浮点数据元素,每个元素包括第二源的相应浮点数据元素乘以第一源的一个或多个浮点数据元素的基数,并将其代入 第一个源的相应浮点数据元素。 公开了其它方法,装置,系统和指令。

    Instruction execution that broadcasts and masks data values at different levels of granularity
    138.
    发明授权
    Instruction execution that broadcasts and masks data values at different levels of granularity 有权
    指令执行,以不同的粒度级别广播和屏蔽数据值

    公开(公告)号:US09424327B2

    公开(公告)日:2016-08-23

    申请号:US13976433

    申请日:2011-12-23

    IPC分类号: G06F7/00 G06F17/30 G06F9/30

    摘要: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

    摘要翻译: 描述了包括执行第一指令和第二指令的执行单元的装置。 执行单元包括输入寄存器空间,以在执行第一指令时存储待复制的第一数据结构,并且在执行第二指令时存储要复制的第二数据结构。 第一和第二数据结构都是打包数据结构。 第一打包数据结构的数据值是第二打包数据结构的数据值的两倍。 当执行第一指令以创建第一复制数据结构时,执行单元还包括复制第一数据结构的复制逻辑电路,以及在执行第二数据指令以创建第二复制数据结构时复制第二数据结构。 执行单元还包括掩蔽逻辑电路,以第一粒度掩蔽第一复制数据结构,并以第二粒度掩蔽第二复制数据结构。 第二粒度是第一粒度的两倍。

    Instruction execution unit that broadcasts data values at different levels of granularity
    139.
    发明授权
    Instruction execution unit that broadcasts data values at different levels of granularity 有权
    指令执行单元,以不同的粒度级别广播数据值

    公开(公告)号:US09336000B2

    公开(公告)日:2016-05-10

    申请号:US13976003

    申请日:2011-12-23

    IPC分类号: G06F9/30 G06F9/38

    摘要: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The first data structure is four times as large as the second data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second instruction to create a second replication data structure.

    摘要翻译: 描述了包括执行第一指令和第二指令的执行单元的装置。 执行单元包括输入寄存器空间,用于在执行第一指令时存储要复制的第一数据结构,并且在执行第二指令时存储要复制的第二数据结构。 第一和第二数据结构都是打包数据结构。 第一打包数据结构的数据值是第二打包数据结构的数据值的两倍。 第一个数据结构是第二个数据结构的四倍。 执行单元还包括复制逻辑电路,以在执行第一指令以创建第一复制数据结构时复制第一数据结构,并且在执行第二指令以创建第二复制数据结构时复制第二数据结构。

    Instruction for shifting bits left with pulling ones into less significant bits
    140.
    发明授权
    Instruction for shifting bits left with pulling ones into less significant bits 有权
    用于将位移位到较低有效位的指令

    公开(公告)号:US09122475B2

    公开(公告)日:2015-09-01

    申请号:US13630131

    申请日:2012-09-28

    IPC分类号: G06F9/30 G06F15/80

    摘要: A mask generating instruction is executed by a processor to improve efficiency of vector operations on an array of data elements. The processor includes vector registers, one of which stores data elements of an array. The processor further includes execution circuitry to receive a mask generating instruction that specifies at least a first operand and a second operand. Responsive to the mask generating instruction, the execution circuitry is to shift bits of the first operand to the left by a number of times defined in the second operand, and pull in a bit of one from the right each time a most significant bit of the first operand is shifted out from the left to generate a result. Each bit in the result corresponds to one of the data elements of the array.

    摘要翻译: 掩模生成指令由处理器执行以提高数据元素阵列上的向量操作的效率。 处理器包括向量寄存器,其中一个存储阵列的数据元素。 处理器还包括执行电路,用于接收指定至少第一操作数和第二操作数的掩码生成指令。 响应于掩模生成指令,执行电路是将第一操作数的位向左移动在第二操作数中定义的次数,并且每次将最高有效位 第一个操作数从左边移出来产生一个结果。 结果中的每个位对应于数组的数据元素之一。