Efficient implementation of complex vector fused multiply add and complex vector multiply

    公开(公告)号:US10521226B2

    公开(公告)日:2019-12-31

    申请号:US15941531

    申请日:2018-03-30

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F17/16 G06F9/38

    摘要: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

    Instruction and Logic for Early Underflow Detection and Rounder Bypass

    公开(公告)号:US20180088940A1

    公开(公告)日:2018-03-29

    申请号:US15280324

    申请日:2016-09-29

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: A processor for floating point underflow detection includes circuitry to decode a first instruction and a floating point unit. The decoded instruction, when executed by the processor, may be for performing a fused multiply-add (FMA) operation. The floating point unit includes circuitry to determine a non-normalized result of the first instruction based on a first input, a second input, and a third input. The floating point unit further includes circuitry to determine whether underflow exists in the non-normalized result based on a first exponent of the first input, a second exponent of the second input, and a third exponent of the third input.

    Floating point scaling processors, methods, systems, and instructions

    公开(公告)号:US09921807B2

    公开(公告)日:2018-03-20

    申请号:US15262609

    申请日:2016-09-12

    申请人: Intel Corporation

    IPC分类号: G06G7/48 G06F7/483 G06F9/30

    摘要: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.

    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY
    89.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY 审中-公开
    方法,装置,说明和逻辑提供向量包装的十字形跨比较功能

    公开(公告)号:US20160188336A1

    公开(公告)日:2016-06-30

    申请号:US14588247

    申请日:2014-12-31

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

    摘要翻译: 指令和逻辑提供SIMD向量填充元组交叉比较功能。 一些处理器实施例包括具有可变多个数据字段的第一和第二寄存器,每个数据字段用于存储第一数据类型的元素。 在一些实施例中,处理器执行用于向量填充元组交叉比较的SIMD指令,对于第一寄存器的元组中的数据字段的一部分的每个数据字段,将其相应元素与数据字段的相应部分的每个元素进行比较 在第二寄存器的元组中,根据相应的比较,在对应于相应的第一寄存器部分的每个未屏蔽元素的位掩码中设置对应于第二寄存器部分的每个元素的掩码位。 在一些实施例中,位掩码由第三寄存器的数据字段中的相应元素移位。 比较类型由即时操作数指示。

    VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR
    90.
    发明申请
    VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR 审中-公开
    矢量屏幕驱动时钟增益的处理器的功率效率

    公开(公告)号:US20150220345A1

    公开(公告)日:2015-08-06

    申请号:US13997791

    申请日:2012-12-19

    申请人: INTEL CORPORATION

    IPC分类号: G06F9/38 G06F9/30

    摘要: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

    摘要翻译: 处理器包括指令调度和调度(调度/调度)单元,以接收单个指令多数据(SIMD)指令,以对存储在由第一源操作数指示的存储位置中的多个数据元素执行操作。 指令调度/调度单元是基于第二源操作数来确定将不被操作以生成写入目的地操作数的结果的第一数据元素。 处理器还包括耦合到指令调度/调度单元的多个处理单元,以矢量方式处理SIMD指令的数据单元,以及耦合到指令调度/调度单元的功率管理单元,以减少第一 所述处理元件被配置为处理所述第一数据元素。