Processor which implements fused and unfused multiply-add instructions in a pipelined manner
    1.
    发明授权
    Processor which implements fused and unfused multiply-add instructions in a pipelined manner 有权
    处理器,以流水线方式实现融合和未分配的加法指令

    公开(公告)号:US08239440B2

    公开(公告)日:2012-08-07

    申请号:US12057894

    申请日:2008-03-28

    IPC分类号: G06F7/38

    摘要: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.

    摘要翻译: 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。

    MECHANISM FOR HANDLING UNFUSED MULTIPLY-ACCUMULATE ACCRUED EXCEPTION BITS IN A PROCESSOR
    2.
    发明申请
    MECHANISM FOR HANDLING UNFUSED MULTIPLY-ACCUMULATE ACCRUED EXCEPTION BITS IN A PROCESSOR 有权
    在处理器中处理未充分的多余累加的例外的机制

    公开(公告)号:US20100268920A1

    公开(公告)日:2010-10-21

    申请号:US12424929

    申请日:2009-04-16

    IPC分类号: G06F9/302

    摘要: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.

    摘要翻译: 用于处理未被使用的加法累加异常位的机制包括包括浮点单元,存储和异常逻辑的处理器。 浮点单元可以被配置为执行用指令集体系结构(ISA)定义的未融合的乘法累加指令。 未发送的乘法累加指令可以包括乘法子操作和累加子操作。 存储器可以被配置为保持浮点异常状态信息。 异常逻辑可以被配置为例如在乘法子操作完成之后并且在累加子操作完成之前捕获浮点异常状态,并且更新存储以反映浮点异常状态。

    Processor pipeline which implements fused and unfused multiply-add instructions
    3.
    发明授权
    Processor pipeline which implements fused and unfused multiply-add instructions 有权
    处理器管道,其实现融合和未加密的乘法加法指令

    公开(公告)号:US08977670B2

    公开(公告)日:2015-03-10

    申请号:US13469212

    申请日:2012-05-11

    IPC分类号: G06F7/38 G06F7/483 G06F7/544

    摘要: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.

    摘要翻译: 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。

    DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES
    4.
    发明申请
    DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES 有权
    具有多个引擎的部门

    公开(公告)号:US20130179664A1

    公开(公告)日:2013-07-11

    申请号:US13345391

    申请日:2012-01-06

    摘要: Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit.

    摘要翻译: 公开了涉及包括用于划分和/或平方根操作的硬件支持的集成电路的技术。 在一个实施例中,公开了一种集成电路,其包括分割单元,该分割单元又包括归一化电路和多个除法引擎。 归一化电路被配置为归一化一组操作数。 每个分频引擎被配置为对从归一化电路接收的相应的归一化操作数集进行操作。 在一些实施例中,集成电路包括调度器单元,其被配置为选择用于向包括该分割单元的多个执行单元发布的指令。 调度器单元还被配置为保持指示当前正在由分割单元操作的指令的数量的计数器,并且基于计数器确定是否计划用于发布到分割单元的后续指令。

    Mechanism for handling unfused multiply-accumulate accrued exception bits in a processor
    5.
    发明授权
    Mechanism for handling unfused multiply-accumulate accrued exception bits in a processor 有权
    在处理器中处理未使用的累积累加异常位的机制

    公开(公告)号:US09507656B2

    公开(公告)日:2016-11-29

    申请号:US12424929

    申请日:2009-04-16

    摘要: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.

    摘要翻译: 用于处理未被使用的加法累加异常位的机制包括包括浮点单元,存储和异常逻辑的处理器。 浮点单元可以被配置为执行用指令集体系结构(ISA)定义的未融合的乘法累加指令。 未发送的乘法累加指令可以包括乘法子操作和累加子操作。 存储器可以被配置为保持浮点异常状态信息。 异常逻辑可以被配置为例如在乘法子操作完成之后并且在累加子操作完成之前捕获浮点异常状态,并且更新存储以反映浮点异常状态。

    System and method of bypassing unrounded results in a multiply-add pipeline unit
    6.
    发明授权
    System and method of bypassing unrounded results in a multiply-add pipeline unit 有权
    在多重加法管道单元中绕过未包围结果的系统和方法

    公开(公告)号:US08671129B2

    公开(公告)日:2014-03-11

    申请号:US13043101

    申请日:2011-03-08

    IPC分类号: G06F7/32

    摘要: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.

    摘要翻译: 一种用于在多重加法管线中执行乘法运算的处理单元,系统和方法。 为了减少流水线延迟,乘法运算的未包围结果被旁路到乘法加法管道的输入端,用于后续操作。 如果确定先前操作需要舍入,则在随后的操作期间将进行舍入。 在随后的操作期间,未被乘法运算使用的布斯编码器将输出舍入校正因子作为选择输入到未被乘法运算使用的布斯多路复用器。 当布斯多路复用器接收舍入校正因子时,布尔多路复用器将输出舍入校正值到进位保存加法器(CSA)树,并且CSA树将从舍入校正值和其他部分乘积生成正确的和。

    Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions
    7.
    发明申请
    Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions 有权
    处理器管道,实现融合和未填充的乘法添加说明

    公开(公告)号:US20120221614A1

    公开(公告)日:2012-08-30

    申请号:US13469212

    申请日:2012-05-11

    IPC分类号: G06F7/48

    摘要: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.

    摘要翻译: 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。

    Method for selecting between divide instructions associated with respective threads in a multi-threaded processor
    8.
    发明授权
    Method for selecting between divide instructions associated with respective threads in a multi-threaded processor 有权
    用于在多线程处理器中与相应线程相关联的除法指令之间进行选择的方法

    公开(公告)号:US07941642B1

    公开(公告)日:2011-05-10

    申请号:US10881216

    申请日:2004-06-30

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3001 G06F9/3851

    摘要: In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.

    摘要翻译: 在一个实施例中,多线程处理器包括多线程指令源,其可以提供多个指令,每个指令对应于多个线程中的相应一个线程。 多线程处理器还包括耦合到多线程指令源的拾取单元。 拾取单元可以在给定周期中选择对应于多个线程中的一个线程的第一除法指令和基于线程选择算法对应于多个线程中的另一线程的第二除法指令。 此外,多线程处理器包括耦合到功能单元的存储器,该功能单元包括被配置为执行第一除法指令和第二除法指令的分配器。 存储器可以在执行第一和第二除法指令中的另一个指令期间存储第一和第二除法指令之一。

    PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR MULTIPLICATION OF LARGE OPERANDS
    9.
    发明申请
    PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR MULTIPLICATION OF LARGE OPERANDS 有权
    用于实施大规模操作的指导性支持的处理器和方法

    公开(公告)号:US20100325188A1

    公开(公告)日:2010-12-23

    申请号:US12488372

    申请日:2009-06-19

    IPC分类号: G06F7/52

    CPC分类号: G06F7/4876 G06F2207/382

    摘要: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M. In response to receiving a single instance of a large-operand multiplication instruction defined within the ISA, wherein at least one of the operands of the large-operand multiplication instruction includes more than the maximum number of bits M, the instruction execution unit is configured to multiply operands of the large-operand multiplication instruction within the hardware multiplier datapath circuit to determine a result of the large-operand multiplication instruction without execution of programmer-selected instructions within the ISA other than the large-operand multiplication instruction.

    摘要翻译: 包括用于实现大操作数乘法的指令支持的处理器可以从定义的指令集架构(ISA)发出用于执行编程器可选择指令的执行。 处理器可以包括指令执行单元,其包括硬件乘法器数据路径电路,其中硬件乘法器数据路径电路被配置为对具有最大位数M的操作数进行乘法。响应于接收到在其中定义的大操作数乘法指令的单个实例 所述ISA,其中所述大操作数乘法指令的操作数中的至少一个包括多于所述最大位数M,所述指令执行单元被配置为将所述大操作数乘法指令在所述硬件乘法器数据通路电路内的操作数乘以 确定大操作数乘法指令的结果,而不在大操作数乘法指令之外执行ISA内的编程器选择指令。

    Low latency integer divider and integration with floating point divider and method
    10.
    发明授权
    Low latency integer divider and integration with floating point divider and method 有权
    低延迟整数分频器,并与浮点分频器和方法集成

    公开(公告)号:US07539720B2

    公开(公告)日:2009-05-26

    申请号:US11014026

    申请日:2004-12-15

    IPC分类号: G06F7/44 G06F7/52

    CPC分类号: G06F7/4873 G06F7/5375

    摘要: A method and device divides a dividend by a divisor, the dividend and the divisor both being integers. The method and device determine a maximum possible number of quotient digits (NDQ) based on a number of significant digits of the divisor and the dividend, normalizes the dividend and divisor, and calculates NDQ number of quotient digits from the normalized divisor and dividend.

    摘要翻译: 一种方法和设备将除数除以除数,除数和除数均为整数。 该方法和装置根据除数和有效数字的有效数字的数量确定最大可能数量的商数(NDQ),对股息和除数进行归一化,并从归一化除数和股息计算商数的NDQ数。