Result normalizer and method of operation
    21.
    发明授权
    Result normalizer and method of operation 失效
    结果规范和操作方法

    公开(公告)号:US5392228A

    公开(公告)日:1995-02-21

    申请号:US161361

    申请日:1993-12-06

    CPC分类号: G06F7/485 G06F5/012

    摘要: A result normalizer (58) for use with an adder (56) generates a mask in two stages that indicates the location of the leading one in the adder result. In the first stage, a leading zero anticipator (68) determines the position to within two digits. In the second stage, a count leading zero indicator (70) determines the position to a single digit. The mask is used to control the number of digits that each stage of a multiplexer array (66) shifts the adder result. The output of the multiplexer array thereby contains a leading one. The result normalizer may be advantageously used in high performance applications such as in a floating point execution unit in a data processor or in digital signal processing systems.

    摘要翻译: 与加法器(56)一起使用的结果归一化器(58)在两个阶段中生成指示加法器结果中前导序列的位置的掩码。 在第一阶段,领先的零预测者(68)将位置确定在两位数之内。 在第二阶段中,计数前导零指示符(70)确定位置到单个数字。 该掩码用于控制多路复用器阵列(66)的每个级移位加法器结果的位数。 因此,多路复用器阵列的输出包含一个前导的。 结果归一化器可以有利地用于高性能应用中,例如在数据处理器或数字信号处理系统中的浮点执行单元中。

    Mechanism for handling unfused multiply-accumulate accrued exception bits in a processor
    22.
    发明授权
    Mechanism for handling unfused multiply-accumulate accrued exception bits in a processor 有权
    在处理器中处理未使用的累积累加异常位的机制

    公开(公告)号:US09507656B2

    公开(公告)日:2016-11-29

    申请号:US12424929

    申请日:2009-04-16

    摘要: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.

    摘要翻译: 用于处理未被使用的加法累加异常位的机制包括包括浮点单元,存储和异常逻辑的处理器。 浮点单元可以被配置为执行用指令集体系结构(ISA)定义的未融合的乘法累加指令。 未发送的乘法累加指令可以包括乘法子操作和累加子操作。 存储器可以被配置为保持浮点异常状态信息。 异常逻辑可以被配置为例如在乘法子操作完成之后并且在累加子操作完成之前捕获浮点异常状态,并且更新存储以反映浮点异常状态。

    System and method of bypassing unrounded results in a multiply-add pipeline unit
    23.
    发明授权
    System and method of bypassing unrounded results in a multiply-add pipeline unit 有权
    在多重加法管道单元中绕过未包围结果的系统和方法

    公开(公告)号:US08671129B2

    公开(公告)日:2014-03-11

    申请号:US13043101

    申请日:2011-03-08

    IPC分类号: G06F7/32

    摘要: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.

    摘要翻译: 一种用于在多重加法管线中执行乘法运算的处理单元,系统和方法。 为了减少流水线延迟,乘法运算的未包围结果被旁路到乘法加法管道的输入端,用于后续操作。 如果确定先前操作需要舍入,则在随后的操作期间将进行舍入。 在随后的操作期间,未被乘法运算使用的布斯编码器将输出舍入校正因子作为选择输入到未被乘法运算使用的布斯多路复用器。 当布斯多路复用器接收舍入校正因子时,布尔多路复用器将输出舍入校正值到进位保存加法器(CSA)树,并且CSA树将从舍入校正值和其他部分乘积生成正确的和。

    Instruction support for performing montgomery multiplication
    24.
    发明授权
    Instruction support for performing montgomery multiplication 有权
    指令支持执行montgomery乘法

    公开(公告)号:US08583902B2

    公开(公告)日:2013-11-12

    申请号:US12776172

    申请日:2010-05-07

    IPC分类号: G06F9/30

    摘要: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R^−1.

    摘要翻译: 公开了涉及包括用于执行蒙哥马利乘法的指令支持的处理器的技术。 处理器可以从定义的指令集架构(ISA)发出执行编程器可选择的指令。 处理器可以包括指令执行单元,其被配置为接收包括在ISA内定义的蒙哥马利乘法指令的第一实例的指令。 蒙哥马利乘法指令可由处理器执行,以至少驻留在处理器的通用寄存器文件的相应部分中的操作数A,B和N操作,其中操作数A,B,N中的至少一个跨越 最少两个通用寄存器寄存器。 指令执行单元被配置为响应于接收到蒙哥马利乘法指令的第一实例来计算P mod N,其中P是至少操作数A,操作数B和R ^ -1的乘积。

    Thread fairness on a multi-threaded processor with multi-cycle cryptographic operations
    25.
    发明授权
    Thread fairness on a multi-threaded processor with multi-cycle cryptographic operations 有权
    具有多周期加密操作的多线程处理器上的线程公平性

    公开(公告)号:US08560814B2

    公开(公告)日:2013-10-15

    申请号:US12773278

    申请日:2010-05-04

    IPC分类号: G06F9/30

    摘要: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.

    摘要翻译: 在多线程处理器中有效执行操作的系统和方法。 每个线程可以包括阻塞指令。 阻塞指令阻止其他线程在相当长的时间内利用硬件资源。 阻塞型指令的一个例子是蒙哥马利乘法加密指令。 每个线程都可以以线程为基础的模式运行,允许在执行阻塞指令期间插入停滞周期,在此期间其他线程可能利用先前阻止的硬件资源。 在多个线程被调度执行阻塞指令的时候,可以改变基于线程的模式,以增加这些多线程的吞吐量。 例如,可以改变该模式以不允许插入失速循环。 因此,可以减少对应于多个线程的阻塞指令的顺序操作的时间。

    Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions
    26.
    发明申请
    Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions 有权
    处理器管道,实现融合和未填充的乘法添加说明

    公开(公告)号:US20120221614A1

    公开(公告)日:2012-08-30

    申请号:US13469212

    申请日:2012-05-11

    IPC分类号: G06F7/48

    摘要: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.

    摘要翻译: 在融合的乘法加法管道中实现未经加密的乘法加法指令。 系统可以包括具有用于接收加法项的输入的对准器,具有用于接收第一值的两个输入和用于乘法的第二值的乘法器树,以及第一进位保存加法器(CSA),其中第一CSA可以接收部分 乘数树中的乘积和对准器的对齐加法项。 该系统可以包括可以接收第一部分乘积,第二部分乘积和对齐的加法项的融合/未融合乘法(FUMA)块,其中第一部分乘积和第二部分乘积不被截断。 FUMA块可以使用第一部分乘积,第二部分积和对齐的相加项来执行未融合的加法运算或融合乘法运算,例如取决于操作码或模式位。

    Methods and mechanisms to support multiple features for a number of opcodes
    27.
    发明授权
    Methods and mechanisms to support multiple features for a number of opcodes 有权
    支持多个操作码的多个功能的方法和机制

    公开(公告)号:US08195923B2

    公开(公告)日:2012-06-05

    申请号:US12420054

    申请日:2009-04-07

    IPC分类号: G06F9/30

    摘要: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.

    摘要翻译: 用于指令集的操作码的多个特征的有效指令支持的系统和方法。 处理器检测计算机程序的获取指令包括对应于多个功能的操作码。 每个功能都对应于不同类型的操作。 处理器确定接收到的指令对应于计算机程序所请求的特征,例如加密算法。 确定是否存在该功能的硬件支持。 如果该功能存在硬件支持,则该指令由硬件在片上执行。 否则,软件将执行与该指令相对应的操作。

    INSTRUCTION SUPPORT FOR PERFORMING MONTGOMERY MULTIPLICATION
    28.
    发明申请
    INSTRUCTION SUPPORT FOR PERFORMING MONTGOMERY MULTIPLICATION 有权
    执行蒙特卡罗法案的指导性支持

    公开(公告)号:US20110276790A1

    公开(公告)日:2011-11-10

    申请号:US12776172

    申请日:2010-05-07

    IPC分类号: G06F9/302

    摘要: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R̂−1.

    摘要翻译: 公开了涉及包括用于执行蒙哥马利乘法的指令支持的处理器的技术。 处理器可以从定义的指令集架构(ISA)发出执行编程器可选择的指令。 处理器可以包括指令执行单元,其被配置为接收包括在ISA内定义的蒙哥马利乘法指令的第一实例的指令。 蒙哥马利乘法指令可由处理器执行,以至少驻留在处理器的通用寄存器文件的相应部分中的操作数A,B和N操作,其中操作数A,B,N中的至少一个跨越 最少两个通用寄存器寄存器。 指令执行单元被配置为响应于接收到蒙哥马利乘法指令的第一实例来计算P mod N,其中P是至少操作数A,操作数B和R-1的乘积。

    Method for selecting between divide instructions associated with respective threads in a multi-threaded processor
    29.
    发明授权
    Method for selecting between divide instructions associated with respective threads in a multi-threaded processor 有权
    用于在多线程处理器中与相应线程相关联的除法指令之间进行选择的方法

    公开(公告)号:US07941642B1

    公开(公告)日:2011-05-10

    申请号:US10881216

    申请日:2004-06-30

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3001 G06F9/3851

    摘要: In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.

    摘要翻译: 在一个实施例中,多线程处理器包括多线程指令源,其可以提供多个指令,每个指令对应于多个线程中的相应一个线程。 多线程处理器还包括耦合到多线程指令源的拾取单元。 拾取单元可以在给定周期中选择对应于多个线程中的一个线程的第一除法指令和基于线程选择算法对应于多个线程中的另一线程的第二除法指令。 此外,多线程处理器包括耦合到功能单元的存储器,该功能单元包括被配置为执行第一除法指令和第二除法指令的分配器。 存储器可以在执行第一和第二除法指令中的另一个指令期间存储第一和第二除法指令之一。

    PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR MULTIPLICATION OF LARGE OPERANDS
    30.
    发明申请
    PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR MULTIPLICATION OF LARGE OPERANDS 有权
    用于实施大规模操作的指导性支持的处理器和方法

    公开(公告)号:US20100325188A1

    公开(公告)日:2010-12-23

    申请号:US12488372

    申请日:2009-06-19

    IPC分类号: G06F7/52

    CPC分类号: G06F7/4876 G06F2207/382

    摘要: A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M. In response to receiving a single instance of a large-operand multiplication instruction defined within the ISA, wherein at least one of the operands of the large-operand multiplication instruction includes more than the maximum number of bits M, the instruction execution unit is configured to multiply operands of the large-operand multiplication instruction within the hardware multiplier datapath circuit to determine a result of the large-operand multiplication instruction without execution of programmer-selected instructions within the ISA other than the large-operand multiplication instruction.

    摘要翻译: 包括用于实现大操作数乘法的指令支持的处理器可以从定义的指令集架构(ISA)发出用于执行编程器可选择指令的执行。 处理器可以包括指令执行单元,其包括硬件乘法器数据路径电路,其中硬件乘法器数据路径电路被配置为对具有最大位数M的操作数进行乘法。响应于接收到在其中定义的大操作数乘法指令的单个实例 所述ISA,其中所述大操作数乘法指令的操作数中的至少一个包括多于所述最大位数M,所述指令执行单元被配置为将所述大操作数乘法指令在所述硬件乘法器数据通路电路内的操作数乘以 确定大操作数乘法指令的结果,而不在大操作数乘法指令之外执行ISA内的编程器选择指令。