Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria
    1.
    发明授权
    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria 有权
    根据标准优化多管道可执行和特定管道可执行指令的分配到执行管道

    公开(公告)号:US06370637B1

    公开(公告)日:2002-04-09

    申请号:US09370789

    申请日:1999-08-05

    Abstract: A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.

    Abstract translation: 公开了一种具有配置成有效地分配多流水线可执行指令的浮点单元的微处理器。 多管道可执行指令是不强制在特定类型执行管道中执行的指令。 例如,垃圾操作是多管道可执行的。 垃圾操作是在浮点单元的流水线的早期执行的指令(例如,在寄存器重命名期间),但是仍然通过用于异常检查的执行管线。 垃圾操作不限于特定的执行管道,而是可以通过浮点单元中的任何一个微处理器的执行流水线。 使用许多不同的标准,在每个时钟周期的基础上分配多流水线可执行指令。 例如,分配可以根据浮点单元在单个时钟周期中接收的多流水线可执行指令的数量而变化。

    Method and apparatus for rapid execution of FCOM and FSTSW
    3.
    发明授权
    Method and apparatus for rapid execution of FCOM and FSTSW 有权
    用于快速执行FCOM和FSTSW的方法和装置

    公开(公告)号:US06425074B1

    公开(公告)日:2002-07-23

    申请号:US09393524

    申请日:1999-09-10

    Abstract: A microprocessor configured to rapidly execute floating point store status word (FSTSW) type instructions that are immediately preceded by floating point compare (FCOM) type instructions is disclosed. FCOM-type instructions are modified to store their results to an architectural floating point status word and a temporary destination register. If an FSTSW-type instruction is detected immediately following an FCOM-type instruction, then the FSTSW-type instruction is transformed into a special fast floating point store status word (FSTSWEF) instruction. Unlike the FSTSW-type instruction, which is serializing and negatively impacts performance, the FSTSWEF instruction is not serializing and allows execution to continue without undue serialization. A computer system and method for rapidly executing FSTSW instructions immediately preceded by FCOM-type instructions are also disclosed.

    Abstract translation: 公开了一种被配置为快速执行浮点比较(FCOM)类型指令之前的浮点存储状态字(FSTSW)类型指令的微处理器。 修改FCOM类型的指令以将其结果存储到架构浮点状态字和临时目标寄存器。 如果在FCOM型指令之后立即检测到FSTSW型指令,则FSTSW型指令被转换为特殊的快速浮点存储状态字(FSTSWEF)指令。 与串行化和负面影响性能的FSTSW型指令不同,FSTSWEF指令不是序列化的,允许执行继续,而不会过多的序列化。 还公开了一种用于在紧接在FCOM型指令之前快速执行FSTSW指令的计算机系统和方法。

    Multi-function bipartite look-up table
    4.
    发明授权
    Multi-function bipartite look-up table 失效
    多功能二分查询表

    公开(公告)号:US06256653B1

    公开(公告)日:2001-07-03

    申请号:US09015084

    申请日:1998-01-29

    Abstract: A multi-function look-up table for determining output values for predetermined ranges of a first mathematical function and a second mathematical function. In one embodiment, the multi-function look-up table is a bipartite look-up table including a first plurality of storage locations and a second plurality of storage locations. The first plurality of storage locations store base values for the first and second mathematical functions. Each base value is an output value (for either the first or second function) corresponding to an input region which includes the look-up table input value. The second plurality of storage locations, on the other hand, store difference values for both the first and second mathematical functions. These difference values are used for linear interpolation in conjunction with a corresponding base value in order to generate a look-up table output value. The multi-function look-up table further includes an address control unit coupled to receive a first input value and a signal which indicates whether an output value is to be generated for the first or second mathematical function. The address control unit then generates a first address value from these signals which is in turn conveyed to the first and second plurality of storage locations. In response to receiving the first address value, the first and second plurality of storage locations are configured to output a first base value and a first difference value, respectively. The first base value and first difference value are then conveyed to an output unit configured to generate a look-up table output value from the two values.

    Abstract translation: 一种用于确定第一数学函数和第二数学函数的预定范围的输出值的多功能查找表。 在一个实施例中,多功能查找表是包括第一多个存储位置和第二多个存储位置的二分查找表。 第一多个存储位置存储第一和第二数学函数的基值。 每个基值是对应于包括查找表输入值的输入区域的输出值(对于第一或第二函数)。 另一方面,第二多个存储位置存储第一和第二数学函数的差值。 这些差值用于与对应的基值相结合的线性插值,以产生查询表输出值。 多功能查找表还包括地址控制单元,其被耦合以接收第一输入值和指示是否为第一或第二数学函数生成输出值的信号。 地址控制单元然后从这些信号产生一个第一地址值,该第一地址值又被传送到第一和第二多个存储位置。 响应于接收到第一地址值,第一和第二多个存储位置被配置为分别输出第一基值和第一差值。 然后将第一基值和第一差分值传送到被配置为从两个值生成查找表输出值的输出单元。

    Floating point addition pipeline including extreme value, comparison and accumulate functions
    5.
    发明授权
    Floating point addition pipeline including extreme value, comparison and accumulate functions 失效
    浮点附加流水线包括极值,比较和累加功能

    公开(公告)号:US06298367B1

    公开(公告)日:2001-10-02

    申请号:US09055916

    申请日:1998-04-06

    Abstract: A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder. The execution unit may be configured to perform vectored addition and subtraction, integer/floating point conversion, reverse subtraction, accumulate, extreme value (minimum/maximum), and comparison instructions.

    Abstract translation: 多媒体执行单元被配置为执行矢量的浮点和整数指令。 执行单元可以包括具有远近数据路径的加法/减法流水线。 远程路径被配置为处理具有大于1的绝对指数差的操作数的有效加法运算和有效减法运算。 关闭路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。 关闭路径被配置为生成两个输出值,其中一个输出值是第一输入操作数加上第二输入操作数的反转版本,而第二输出值等于第一输出值加1。 在闭合路径中选择第一或第二输出值对加法器的输出实现了舍入到最近的运算。 执行单元可以被配置为执行向量加法和减法,整数/浮点转换,反向减法,累加,极值(最小/最大)和比较指令。

    Bipartite look-up table with output values having minimized absolute error
    6.
    发明授权
    Bipartite look-up table with output values having minimized absolute error 失效
    输出值为绝对误差最小的双向查找表

    公开(公告)号:US06223192B1

    公开(公告)日:2001-04-24

    申请号:US09098482

    申请日:1998-06-16

    Abstract: A method for generating entries for a bipartite look-up table having base and difference table portions. In one embodiment, these entries are usable to form output values for a mathematical function, f(x), in response to receiving corresponding input values within a predetermined input range. The method first comprises partitioning the input range into I intervals, J subintervals/interval, and K sub-subintervals/subinterval. For a given interval M, the method includes generating K difference table entries and J base table entries. Each of the K difference table entries corresponds to a particular group of sub-subintervals within interval M, each of which has the same relative position within their respective subintervals. Each difference table entry is computed by averaging difference values for the sub-subintervals included in a corresponding group N. Each difference value which makes up this average is equal to f(X1)−f(X2), where X1 is the midpoint of the sub-subinterval within group N, and X2 is the midpoint of a predetermined reference sub-subinterval within the same subinterval as X1. Each of these midpoints is calculated such that maximum absolute error is minimized for all possible input values in the sub-subinterval. Each of the J base table entries, on the other hand, corresponds to a subinterval within interval M. Each entry is equal to f(X2)+adjust, where X2 is the midpoint of the reference sub-subinterval of the subinterval corresponding to the base table entry. The adjust value is calculated so that error introduced by the averaging of the difference table entries is evenly distributed over the entire subinterval.

    Abstract translation: 一种用于为具有基准和差分表部分的二分查找表生成条目的方法。 在一个实施例中,响应于在预定输入范围内接收对应的输入值,这些条目可用于形成数学函数f(x)的输出值。 该方法首先包括将输入范围分为I个间隔,J个子间隔/间隔和K个子间隔/子间隔。 对于给定的间隔M,该方法包括生成K个差表表项和J个基表项。 K个差异表条目中的每一个对应于间隔M内的特定的子子区间组,每个子区间在它们各自的子区间内具有相同的相对位置。 通过对包括在对应组N中的子子间隔的差分值进行平均来计算每个差分表项。构成该平均值的每个差值等于f(X1)-f(X2),其中X1是 在组N内的子子间隔,X2是与X1相同的子间隔内的预定参考子子间隔的中点。 计算这些中点中的每一个,使得对子子区间中的所有可能输入值的最大绝对误差最小化。 另一方面,每个J基表条目对应于间隔M内的子间隔。每个条目等于f(X2)+调整,其中X2是对应于子帧的子间隔的参考子子间隔的中点 基表项。 计算调整值,使得通过差表表项的平均引入的误差在整个子间隔上均匀分布。

    Graphics processor with memory management unit and cache coherent link
    8.
    发明授权
    Graphics processor with memory management unit and cache coherent link 有权
    具有内存管理单元和缓存一致链接的图形处理器

    公开(公告)号:US08860741B1

    公开(公告)日:2014-10-14

    申请号:US11608436

    申请日:2006-12-08

    CPC classification number: G09G5/36 G06F9/50 G06F12/0831 G06F2212/302 G09G5/363

    Abstract: In contrast to a conventional computing system in which the graphics processor (graphics processing unit or GPU) is treated as a slave to one or several CPUs, systems and methods are provided that allow the GPU to be treated as a central processing unit (CPU) from the perspective of the operating system. The GPU can access a memory space shared by other CPUs in the computing system. Caches utilized by the GPU may be coherent with caches utilized by other CPUs in the computing system. The GPU may share execution of general-purpose computations with other CPUs in the computing system.

    Abstract translation: 与将图形处理器(图形处理单元或GPU)视为一个或多个CPU的从属设备的常规计算系统相反,提供允许GPU被视为中央处理单元(CPU)的系统和方法, 从操作系统的角度。 GPU可以访问计算系统中其他CPU共享的内存空间。 GPU使用的高速缓存可能与计算系统中其他CPU所使用的高速缓存一致。 GPU可能与计算系统中的其他CPU共享通用计算的执行。

    Floating point addition pipeline including extreme value, comparison and accumulate functions
    9.
    发明授权
    Floating point addition pipeline including extreme value, comparison and accumulate functions 有权
    浮点附加流水线包括极值,比较和累加功能

    公开(公告)号:US06397239B2

    公开(公告)日:2002-05-28

    申请号:US09778352

    申请日:2001-02-06

    Abstract: A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.

    Abstract translation: 多媒体执行单元被配置为执行矢量的浮点和整数指令。 执行单元可以包括具有远近数据路径的加法/减法流水线。 远程路径被配置为处理具有大于1的绝对指数差的操作数的有效加法运算和有效减法运算。 关闭路径被配置为处理具有小于或等于1的绝对指数差的操作数的有效减法操作。 关闭路径被配置为生成两个输出值,其中一个输出值是第一输入操作数加上第二输入操作数的反转版本,而第二输出值等于第一输出值加1。 在闭合路径中选择第一或第二输出值对加法器的输出实现了舍入到最近的运算。

    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit
    10.
    发明授权
    Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit 有权
    通过将比较结果存储在浮点单元中的临时寄存器中,FCOMI后快速执行FCMOV

    公开(公告)号:US06393555B1

    公开(公告)日:2002-05-21

    申请号:US09370787

    申请日:1999-08-05

    Abstract: A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.

    Abstract translation: 具有浮点单元的微处理器被配置为快速执行浮点比较(FCOMI)类型指令,其后面是浮点条件移动(FC​​MOV)类型指令。 通常将其结果存储到整数状态标志寄存器的FCOMI型指令进行修改,以将其结果的副本存储到位于浮点单元内的临时寄存器。 如果在FCOMI型指令之后检测到FCMOV型指令,则FCMOV型指令的标志信息源从整数标志寄存器改变为临时寄存器。 因此,FCMOV型指令能够早期执行,因为它们不需要等待从微处理器的整数部分读取整数标志。 还公开了一种用于快速执行FCOMI型指令的计算机系统和方法,随后是FCMOV型指令。

Patent Agency Ranking