-
公开(公告)号:US10209986B2
公开(公告)日:2019-02-19
申请号:US13976792
申请日:2011-12-22
申请人: Jesus Corbal San Adrian , Cristina S. Anderson , Robert Valentine , Bret Toll , Amit Gradstein , Simon Rubanovich , Benny Eitan
发明人: Jesus Corbal San Adrian , Cristina S. Anderson , Robert Valentine , Bret Toll , Amit Gradstein , Simon Rubanovich , Benny Eitan
IPC分类号: G06F9/30
摘要: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
-
62.
公开(公告)号:US10162639B2
公开(公告)日:2018-12-25
申请号:US15912498
申请日:2018-03-05
摘要: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
-
63.
公开(公告)号:US10162638B2
公开(公告)日:2018-12-25
申请号:US15912486
申请日:2018-03-05
摘要: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
-
64.
公开(公告)号:US10162637B2
公开(公告)日:2018-12-25
申请号:US15912468
申请日:2018-03-05
摘要: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
-
65.
公开(公告)号:US09690591B2
公开(公告)日:2017-06-27
申请号:US12290395
申请日:2008-10-30
申请人: Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi
发明人: Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi
IPC分类号: G06F9/30 , G06F9/38 , G06F12/084 , G06F12/0875 , G06F13/40
CPC分类号: G06F9/3853 , G06F9/3016 , G06F9/3017 , G06F9/30196 , G06F9/3836 , G06F12/084 , G06F12/0875 , G06F13/4063 , G06F2212/452 , G06F2212/62 , Y02D10/14 , Y02D10/151
摘要: A technique to enable efficient instruction fusion within a computer system is disclosed. In one embodiment, processor logic delays the processing of a first instruction for a threshold amount of time if the first instruction within an instruction queue is fusible with a second instruction.
-
公开(公告)号:US09632980B2
公开(公告)日:2017-04-25
申请号:US13976435
申请日:2011-12-23
CPC分类号: G06F9/30032 , G06F9/30036 , G06F9/30145 , G06F15/8092
摘要: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
-
公开(公告)号:US09600285B2
公开(公告)日:2017-03-21
申请号:US13977239
申请日:2011-12-22
申请人: Bret L. Toll , Robert Valentine , Jesus Corbal San Adrian , Elmoustapha Ould-Ahmed-Vall , Mark Charney
发明人: Bret L. Toll , Robert Valentine , Jesus Corbal San Adrian , Elmoustapha Ould-Ahmed-Vall , Mark Charney
CPC分类号: G06F9/3017 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F13/14
摘要: A method of an aspect includes receiving a packed data operation mask concatenation instruction. The packed data operation mask concatenation instruction indicates a first source having a first packed data operation mask, indicates a second source having a second packed data operation mask, and indicates a destination. A result is stored in the destination in response to the packed data operation mask concatenation instruction. The result includes the first packed data operation mask concatenated with the second packed data operation mask. Other methods, apparatus, systems, and instructions are disclosed.
-
68.
公开(公告)号:US09524168B2
公开(公告)日:2016-12-20
申请号:US13997244
申请日:2011-12-23
CPC分类号: G06F9/38 , G06F9/30018 , G06F9/30032 , G06F9/30036
摘要: An apparatus and method are described for shuffling data elements from source registers to a destination register. For example, a method according to one embodiment includes the following operations: reading each mask bit stored in a mask data structure, the mask data structure containing mask bits associated with data elements of a destination register, the values usable for determining whether a masking operation or a shuffle operation should be performed on data elements stored within a first source register and a second source register; for each data element of the destination register, if a mask bit associated with the data element indicates that a shuffle operation should be performed, then shuffling data elements from the first source register and the second source register to the specified data element within the destination register; and if the mask bit indicates that a masking operation should be performed, then performing a specified masking operation with respect to the data element of the destination register.
摘要翻译: 描述了将数据元素从源寄存器混合到目的地寄存器的装置和方法。 例如,根据一个实施例的方法包括以下操作:读取存储在掩模数据结构中的每个掩码位,所述掩码数据结构包含与目的地寄存器的数据元素相关联的掩码位,可用于确定掩蔽操作 或者应当对存储在第一源寄存器和第二源寄存器中的数据元素执行混洗操作; 对于目标寄存器的每个数据元素,如果与数据元素相关联的掩码位指示应当执行混洗操作,则将数据元素从第一源寄存器和第二源寄存器混洗到目标寄存器中的指定数据元素 ; 并且如果掩码位指示应当执行掩蔽操作,则对目的地寄存器的数据元素执行指定的掩蔽操作。
-
公开(公告)号:US20160246597A1
公开(公告)日:2016-08-25
申请号:US15145748
申请日:2016-05-03
申请人: Oren Ben-Kiki , llan Pardo , Robert Valentine , Eliezer Weissmann , Dror Markovich , Yuval Yosef
发明人: Oren Ben-Kiki , llan Pardo , Robert Valentine , Eliezer Weissmann , Dror Markovich , Yuval Yosef
IPC分类号: G06F9/30
CPC分类号: G06F9/3802 , G06F9/3004 , G06F9/30043 , G06F9/30076 , G06F9/30101 , G06F9/30145 , G06F9/3016 , G06F9/384 , G06F9/3877 , G06F9/3879 , G06F9/3881 , G06F9/54 , G06F11/0721 , G06F11/0724 , G06F11/0772 , G06F12/0875 , G06F2212/452
摘要: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.
摘要翻译: 描述了一种用于提供加速器的低延迟调用的装置和方法。 例如,根据一个实施例的处理器包括:命令寄存器,用于存储标识要执行的命令的命令数据; 用于存储命令结果的结果寄存器或指示不能执行推荐的原因的数据; 执行逻辑以执行包括用于调用一个或多个加速器命令的加速器调用指令的多个指令,所述加速器调用指令将指定所述命令的命令数据存储在所述命令寄存器内; 一个或多个加速器,用于从命令寄存器读取命令数据并响应于尝试执行由命令数据识别的命令,其中如果一个或多个加速器成功地执行命令,则一个或多个加速器将存储包括 结果寄存器中的命令结果; 并且如果一个或多个加速器不能成功地执行命令,则一个或多个加速器将存储指示不能执行该命令的原因的结果数据,其中执行逻辑将暂停执行,直到加速器完成执行或被中断 其中所述加速器包括用于存储其状态的逻辑,如果被中断,使得其可以在稍后的时间继续执行。
-
公开(公告)号:US20160188341A1
公开(公告)日:2016-06-30
申请号:US14583050
申请日:2014-12-24
申请人: Elmoustapha OULD-AHMED-VALL , Robert Valentine , Jesus Corbal , Mark Charney , Roger Espasa , Guillem Sole , Manel Fernandez , Brian J. Hickmann
发明人: Elmoustapha OULD-AHMED-VALL , Robert Valentine , Jesus Corbal , Mark Charney , Roger Espasa , Guillem Sole , Manel Fernandez , Brian J. Hickmann
IPC分类号: G06F9/30
CPC分类号: G06F9/30196 , G06F9/30014 , G06F9/30018 , G06F9/30036 , G06F9/30167
摘要: In one embodiment of the invention, a processor including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a sum of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands.
摘要翻译: 在本发明的一个实施例中,一种包括存储位置的处理器,被配置为存储一组源压缩数据操作数,每个操作数具有多个压缩数据元素,这些数据元素根据一个中的立即位值为正或负 的操作数。 处理器还包括:解码器,用于解码需要多个源操作数的输入的指令,以及执行单元,用于接收解码的指令并产生作为源操作数之和的结果。 在一个实施例中,将结果存储回源操作数之一,或将结果存储到独立于源操作数的操作数中。
-
-
-
-
-
-
-
-
-