Apparatus and method for handling tiny numbers using a super sticky bit in a microprocessor
    51.
    发明授权
    Apparatus and method for handling tiny numbers using a super sticky bit in a microprocessor 有权
    在微处理器中使用超级粘性位处理微小数字的装置和方法

    公开(公告)号:US06374345B1

    公开(公告)日:2002-04-16

    申请号:US09359919

    申请日:1999-07-22

    IPC分类号: G06F700

    摘要: An apparatus and method for handling tiny numbers using a super sticky bit are provided. In response to detecting that a preliminary result of an instruction corresponds to a tiny number and an underflow exception is masked, an execution pipeline can be configured to store a value corresponding to the preliminary result and a super sticky bit in a destination register. Also, a destination register tag corresponding to the destination register and a denormal exception indicator corresponding to the tiny number and masked underflow exception can be stored. A trap handler can be initiated to generate a corrected result for the instruction. The trap handler can detect that the denormal exception indicator has been set and can read the value and the super sticky bit from the destination register using the destination register tag. The trap handler can generate a corrected result for the instruction based on the value and the super sticky bit. An instruction subsequent to the trapping instruction can then be restarted.

    摘要翻译: 提供了一种使用超级粘性位处理微小数字的装置和方法。 响应于检测到指令的初步结果对应于微数,并且下溢异常被屏蔽,执行流水线可以被配置为存储与目标寄存器中的初步结果和超粘性位对应的值。 此外,可以存储对应于目的地寄存器的目的地寄存器标签和对应于微小数量和掩蔽的下溢异常的异常异常指示符。 可以启动陷阱处理程序以生成指令的校正结果。 陷阱处理程序可以检测到异常异常指示器已设置,并可以使用目标寄存器标签从目标寄存器读取该值和超级粘性位。 陷阱处理程序可以根据值和超级粘性位产生指令的校正结果。 然后可以重新启动捕获指令之后的指令。

    Method and apparatus for achieving higher frequencies of exactly rounded
results
    52.
    发明授权
    Method and apparatus for achieving higher frequencies of exactly rounded results 失效
    用于实现更高频率的精确圆整结果的方法和装置

    公开(公告)号:US6134574A

    公开(公告)日:2000-10-17

    申请号:US75073

    申请日:1998-05-08

    摘要: A multiplier configured to obtain higher frequencies of exactly rounded results by adding an adjustment constant to intermediate products generated during iterative multiplication operations is disclosed. One such iterative multiplication operation is the Newton-Raphson iteration, which may be utilized by the multiplier to perform reciprocal calculations and reciprocal square root calculations. For each iteration, the results converge toward an infinitely precise result. To improve the frequency of the exactly rounded result, the results of the iterative calculations may be studied for a large number of differing input operands to determine the best suited value for the adjustment constant. The multiplier may also be configured to perform scalar and packed vector multiplication using the same hardware.

    摘要翻译: 公开了一种乘法器,其被配置为通过向迭代乘法运算中产生的中间乘积增加一个调整常数来获得更高频率的精确舍入结果。 一个这样的迭代乘法运算是牛顿 - 拉夫逊迭代,乘法运算可以用来进行相互计算和相互平方根计算。 对于每次迭代,结果趋向于无限精确的结果。 为了提高精确舍入结果的频率,可以针对大量不同的输入操作数来研究迭代计算的结果,以确定调整常数的最佳值。 乘法器还可以被配置为使用相同的硬件执行标量和压缩向量乘法。

    Floating point arithmetic unit including an efficient close data path
    53.
    发明授权
    Floating point arithmetic unit including an efficient close data path 失效
    浮点运算单元包括有效的关闭数据路径

    公开(公告)号:US6094668A

    公开(公告)日:2000-07-25

    申请号:US49893

    申请日:1998-03-27

    申请人: Stuart F. Oberman

    发明人: Stuart F. Oberman

    摘要: An execution unit configured to execute vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far data path is configured to handle effective addition operations, as well as effective subtraction operations for operands having an absolute exponent difference greater than one. The close data path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close data path includes an adder unit configured to generate a first and second output value. The first output value is equal to the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. The two output values are conveyed to a multiplexer unit, which selects one of the output values as a preliminary subtraction result based on a final selection signal received from a selection unit. The selection unit generates the final selection signal from a plurality of preliminary selection signals based on the carry in signal to the most significant bit of the first adder output value. Selection of the first or second output value in the close data path effectuates the round-to-nearest operation.

    摘要翻译: 执行单元,被配置为执行矢量的浮点和整数指令。 执行单元可以包括具有远近数据路径的加法/减法流水线。 远数据路径被配置为处理有效的加法运算,以及具有大于1的绝对指数差的操作数的有效减法运算。 关闭数据路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。 关闭数据路径包括被配置为产生第一和第二输出值的加法器单元。 第一个输出值等于第一个输入操作数加第二个输入操作数的反转版本,而第二个输出值等于第一个输出值加一。 两个输出值被传送到多路复用器单元,该多路复用器单元基于从选择单元接收的最终选择信号来选择输出值之一作为初步减法结果。 选择单元基于进位信号到第一加法器输出值的最高有效位,从多个初步选择信号生成最终选择信号。 在关闭数据路径中选择第一个或第二个输出值会实现最近到最近的操作。

    Leading one prediction unit for normalizing close path subtraction
results within a floating point arithmetic unit
    54.
    发明授权
    Leading one prediction unit for normalizing close path subtraction results within a floating point arithmetic unit 失效
    引导一个预测单元,用于在浮点运算单元内归一化关闭路径减法结果

    公开(公告)号:US6085208A

    公开(公告)日:2000-07-04

    申请号:US49758

    申请日:1998-03-27

    摘要: An optimized multimedia execution unit configured to perform vectored floating point and integer instructions. In one embodiment, the execution unit includes an add/subtract pipeline having far and close data paths. The far data path is configured to handle effective addition operations, as well as effective subtraction operations for operands having an absolute exponent difference greater than one. The close data path, conversely, is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The execution unit may also include a plurality of add/subtract pipelines, allowing vectored add, subtract, and integer/floating point conversion instructions to be performed. The execution unit may also be expanded to handle additional arithmetic instructions (such as reverse subtract and accumulate functions) by appropriate input multiplexing. The execution unit may also be configured with a leading one prediction unit that is configured to predict the position of a leading one value for certain results in order to improve normalization times.

    摘要翻译: 优化的多媒体执行单元被配置为执行矢量的浮点和整数指令。 在一个实施例中,执行单元包括具有远近数据路径的加/减流水线。 远数据路径被配置为处理有效的加法运算,以及具有大于1的绝对指数差的操作数的有效减法运算。 相反地​​,关闭数据路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。 执行单元还可以包括多个加/减流水线,允许执行向量加,减和整数/浮点转换指令。 还可以通过适当的输入复用来扩展执行单元以处理额外的算术指令(例如反向减法和累加功能)。 执行单元还可以配置有前导预测单元,其被配置为预测某些结果的前导值的位置,以便提高规范化时间。

    Microprocessor including an efficient implemention of an accumulate
instruction
    55.
    发明授权
    Microprocessor including an efficient implemention of an accumulate instruction 失效
    微处理器包括有效实现累加指令

    公开(公告)号:US5918062A

    公开(公告)日:1999-06-29

    申请号:US14507

    申请日:1998-01-28

    摘要: An execution unit configured to perform a plurality of arithmetic operations using the same set of operands. These operands include corresponding input vector values in each of a plurality of input registers. The execution unit is coupled to receive these input vector values, as well as an instruction value indicative of one of the plurality of arithmetic operations. In one embodiment, the plurality of arithmetic operations includes a vectored add instruction, a vectored subtract instruction, a vectored reverse subtract instruction, and an accumulate instruction. The vectored instructions perform arithmetic operations concurrently using corresponding values from each of the plurality of input registers. The accumulate instruction, however, is executable to add together all input values within a single input register. The execution unit further includes a multiplexer unit configured to selectively route the input vector values to a plurality of adder units according to the opcode value. In an embodiment in which the execution unit is configured to perform subtraction operations as well as addition, the multiplexer unit is additionally configured to selectively route negated versions (either one's or two's complement format) to the plurality of adder units. Each of the plurality of adder units is configured to generate a sum based upon the values conveyed from the multiplexer unit. The accumulate instruction advantageously allows important operations such as the matrix multiply to be performed rapidly. Because the matrix multiply is an integral part of many applications (particularly graphics applications), the accumulate instruction may lead to increased overall system performance.

    摘要翻译: 执行单元,被配置为使用相同的一组操作数执行多个算术运算。 这些操作数在多个输入寄存器的每一个中包括相应的输入向量值。 执行单元被耦合以接收这些输入向量值,以及指示多个算术运算之一的指令值。 在一个实施例中,多个算术运算包括矢量加法指令,矢量减法指令,向量反向减法指令和累加指令。 矢量指令使用来自多个输入寄存器中的每一个的对应值同时执行算术运算。 然而,累加指令可执行,以将单个输入寄存器中的所有输入值相加。 执行单元还包括多路复用器单元,被配置为根据操作码值选择性地将输入矢量值路由到多个加法器单元。 在其中执行单元被配置为执行减法运算以及加法的实施例中,多路复用器单元另外配置成选择性地将否定版本(一者或二者的补码格式)路由到多个加法器单元。 多个加法器单元中的每一个被配置为基于从多路复用器单元传送的值产生和。 累加指令有利地允许快速执行诸如矩阵乘法的重要操作。 由于矩阵乘法是许多应用程序(特别是图形应用程序)的组成部分,累加指令可能会导致整体系统性能的提高。

    Multipurpose arithmetic functional unit
    56.
    发明授权
    Multipurpose arithmetic functional unit 有权
    多功能算术功能单元

    公开(公告)号:US07640285B1

    公开(公告)日:2009-12-29

    申请号:US10970101

    申请日:2004-10-20

    IPC分类号: G06F7/38 G06G1/02

    CPC分类号: G06F7/57 G06F2207/3884

    摘要: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x−xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.

    摘要翻译: 多用途算术功能单元可以执行平面属性插值和一元函数近似运算。 在一个实施例中,通过计算A * x + B * y + C来执行坐标(x,y)的平面内插操作,并且通过计算F2(xb)* xh2 + F1(xb)来执行操作数x的一元函数近似运算 )* xh + F0(xb),其中xh = x-xb。 共享乘法器和加法器电路有利地用于实现两类操作的乘积和求和运算。

    Multipurpose functional unit with multiply-add and format conversion pipeline
    57.
    发明授权
    Multipurpose functional unit with multiply-add and format conversion pipeline 有权
    具有多重加法和格式转换管道的多用途功能单元

    公开(公告)号:US07428566B2

    公开(公告)日:2008-09-23

    申请号:US10985674

    申请日:2004-11-10

    IPC分类号: G06F7/38

    CPC分类号: G06F9/30014 G06F9/3885

    摘要: A multipurpose functional unit is configurable to support a number of operations including multiply-add and format conversion operations, as well as other integer and/or floating-point arithmetic operations, Boolean operations, and logical test operations.

    摘要翻译: 多用途功能单元可配置为支持多种操作,包括乘法加法和格式转换操作,以及其他整数和/或浮点算术运算,布尔运算和逻辑运算。

    System and method for late-dropping packets in a network switch
    58.
    发明授权
    System and method for late-dropping packets in a network switch 失效
    在网络交换机中丢包的系统和方法

    公开(公告)号:US07406041B2

    公开(公告)日:2008-07-29

    申请号:US10209545

    申请日:2002-07-31

    IPC分类号: H04L12/26

    摘要: A system and method for late-dropping packets in a network switch. A network switch may include multiple input ports, multiple output ports, and a shared random access memory coupled to the input ports and output ports by data transport logic. Packets entering the switch may be subject to input thresholding, and may be assigned to a flow within a group. A portion of a packet subject to input thresholding may be accepted into the switch and assigned to a group and flow even if, at the time of arrival of the portion, there are not enough resources available to receive the remainder of the packet. This partial receipt of the packet is allowed because of the possibility of additional resources becoming available between the time of receipt of and resource allocation for the portion of the packet and receipt of subsequent portions of the packet.

    摘要翻译: 一种用于在网络交换机中后期丢包的系统和方法。 网络交换机可以包括多个输入端口,多个输出端口以及通过数据传输逻辑耦合到输入端口和输出端口的共享随机存取存储器。 进入交换机的数据包可能需要输入阈值,并且可以分配给组内的流。 即使在部分到达时,没有足够的资源可用于接收分组的其余部分,经过输入阈值处理的分组的一部分也可被接收到交换机中并分配给组和流。 允许分组的这种部分接收是因为在分组的接收时间和分组的部分和分组的后续部分的接收之间有额外的资源可用。

    High-speed function approximation
    59.
    发明授权
    High-speed function approximation 有权
    高速函数近似

    公开(公告)号:US07366745B1

    公开(公告)日:2008-04-29

    申请号:US10861184

    申请日:2004-06-03

    IPC分类号: G06F1/02

    CPC分类号: G06F7/544

    摘要: Methods and apparatuses are presented for determining coefficients for a polynomial-based approximation of a function, by iteratively estimating a first coefficient, reducing the first coefficient to a lower precision to obtain a first limited-precision coefficient, analytically calculating a second coefficient by taking into account the first limited-precision coefficient, reducing the second coefficient to a lower precision to obtain a second limited-precision coefficient, iteratively estimating a third coefficient by taking into account at least one of the first limited-precision coefficient and the second limited-precision coefficient, and reducing the third coefficient to a lower precision to obtain a third limited-precision coefficient. In one embodiment of the invention, the polynomial-based approximation relates to a minimax approximation of the function approximated, and at least one of the steps for iteratively estimating the first coefficient and iteratively estimating the third coefficient involves use of a Remez exchange algorithm.

    摘要翻译: 提出方法和装置,用于通过迭代地估计第一系数来确定函数的基于多项式的近似的系数,将第一系数降低到较低的精度以获得第一有限精度系数,通过考虑计算第二系数来分析计算第二系数 考虑到第一有限精度系数,将第二系数降低到较低精度以获得第二有限精度系数,通过考虑第一有限精度系数和第二限制精度系数中的至少一个来迭代地估计第三系数 系数,并将第三系数降低到较低的精度,以获得第三有限精度系数。 在本发明的一个实施例中,基于多项式的近似涉及逼近的函数的最小近似,并且用于迭代地估计第一系数并迭代地估计第三系数的步骤中的至少一个涉及使用Remez交换算法。

    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria
    60.
    发明授权
    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria 有权
    根据标准优化多管道可执行和特定管道可执行指令的分配到执行管道

    公开(公告)号:US06370637B1

    公开(公告)日:2002-04-09

    申请号:US09370789

    申请日:1999-08-05

    IPC分类号: G06F938

    摘要: A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.

    摘要翻译: 公开了一种具有配置成有效地分配多流水线可执行指令的浮点单元的微处理器。 多管道可执行指令是不强制在特定类型执行管道中执行的指令。 例如,垃圾操作是多管道可执行的。 垃圾操作是在浮点单元的流水线的早期执行的指令(例如,在寄存器重命名期间),但是仍然通过用于异常检查的执行管线。 垃圾操作不限于特定的执行管道,而是可以通过浮点单元中的任何一个微处理器的执行流水线。 使用许多不同的标准,在每个时钟周期的基础上分配多流水线可执行指令。 例如,分配可以根据浮点单元在单个时钟周期中接收的多流水线可执行指令的数量而变化。