Method and apparatus for rapid execution of FCOM and FSTSW
    22.
    发明授权
    Method and apparatus for rapid execution of FCOM and FSTSW 有权
    用于快速执行FCOM和FSTSW的方法和装置

    公开(公告)号:US06425074B1

    公开(公告)日:2002-07-23

    申请号:US09393524

    申请日:1999-09-10

    Abstract: A microprocessor configured to rapidly execute floating point store status word (FSTSW) type instructions that are immediately preceded by floating point compare (FCOM) type instructions is disclosed. FCOM-type instructions are modified to store their results to an architectural floating point status word and a temporary destination register. If an FSTSW-type instruction is detected immediately following an FCOM-type instruction, then the FSTSW-type instruction is transformed into a special fast floating point store status word (FSTSWEF) instruction. Unlike the FSTSW-type instruction, which is serializing and negatively impacts performance, the FSTSWEF instruction is not serializing and allows execution to continue without undue serialization. A computer system and method for rapidly executing FSTSW instructions immediately preceded by FCOM-type instructions are also disclosed.

    Abstract translation: 公开了一种被配置为快速执行浮点比较(FCOM)类型指令之前的浮点存储状态字(FSTSW)类型指令的微处理器。 修改FCOM类型的指令以将其结果存储到架构浮点状态字和临时目标寄存器。 如果在FCOM型指令之后立即检测到FSTSW型指令,则FSTSW型指令被转换为特殊的快速浮点存储状态字(FSTSWEF)指令。 与串行化和负面影响性能的FSTSW型指令不同,FSTSWEF指令不是序列化的,允许执行继续,而不会过多的序列化。 还公开了一种用于在紧接在FCOM型指令之前快速执行FSTSW指令的计算机系统和方法。

    Method and apparatus for multi-function arithmetic

    公开(公告)号:US06223198B1

    公开(公告)日:2001-04-24

    申请号:US09134171

    申请日:1998-08-14

    Abstract: A multiplier capable of performing signed and unsigned scalar and vector multiplication is disclosed. The multiplier is configured to receive signed or unsigned multiplier and multiplicand operands in scalar or packed vector form. An effective sign for the multiplier and multiplicand operands may be calculated and used to create and select a number of partial products according to Booth's algorithm. Once the partial products have been created and selected, they may be summed and the results may be output. The results may be signed or unsigned, and may represent vector or scalar quantities. When a vector multiplication is performed, the multiplier may be configured to generate and select partial products so as to effectively isolate the multiplication process for each pair of vector components. The multiplier may also be configured to sum the products of the vector components to form the vector dot product. The final product may be output in segments so as to require fewer bus lines. The segments may be rounded by adding a rounding constant. Rounding and normalization may be performed in two paths, one assuming an overflow will occur, the other assuming no overflow will occur. The multiplier may also be configured to perform iterative calculations to evaluate constant powers of an operand. Intermediate products that are formed may be rounded and normalized in two paths and then compressed and stored for use in the next iteration. An adjustment constant may also be added to increase the frequency of exactly rounded results.

    Computer having multimedia operations executable as two distinct sets of
operations within a single instruction cycle
    24.
    发明授权
    Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle 失效
    具有多媒体操作的计算机在单个指令周期内可执行为两组不同的操作

    公开(公告)号:US6061521A

    公开(公告)日:2000-05-09

    申请号:US759042

    申请日:1996-12-02

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU may be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers may be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands. In one embodiment, an arithmetic logic unit may be partitioned into at least two logic portions. A first logic portion may be coupled to receive a first operand from a fixed slot of a first register and a second operand from any slot of a second register. A second logic portion may be coupled to receive a third operand from a fixed slot of the first register and a fourth operand from any slot of the second register. The first logic portion may perform an arithmetic operation dissimilar from the second logic portion.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地CPU总线耦合到常规处理器。 MEU使用向量寄存器,向量ALU和操作数路由单元(ORU)来尽可能少地执行多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地路由和组合在矢量ALU。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。 在一个实施例中,算术逻辑单元可以被划分为至少两个逻辑部分。 第一逻辑部分可以被耦合以从第二寄存器的任何时隙的第一寄存器的固定时隙和第二操作数接收第一操作数。 第二逻辑部分可以被耦合以从第一寄存器的固定时隙接收第三操作数,并从第二寄存器的任何时隙接收第四操作数。 第一逻辑部分可以执行与第二逻辑部分不同的算术运算。

    Apparatus for routing one operand to an arithmetic logic unit from a
fixed register slot and another operand from any register slot
    25.
    发明授权
    Apparatus for routing one operand to an arithmetic logic unit from a fixed register slot and another operand from any register slot 有权
    用于从固定寄存器时隙将一个操作数路由到算术逻辑单元的装置,以及来自任何寄存器时隙的另一个操作数的装置

    公开(公告)号:US06047372A

    公开(公告)日:2000-04-04

    申请号:US290837

    申请日:1999-04-13

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands. In one embodiment multiple ALUs may each receive one operand from a fixed source register slot location, where the fixed slot location may be different for each ALU. The operand routing may provide another operand from any source register slot location for another input to each respective ALU.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地CPU总线耦合到常规处理器。 MEU使用向量寄存器,向量ALU和操作数路由单元(ORU)来尽可能少地执行多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地在矢量ALU中路由和组合。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。 在一个实施例中,多个ALU可以从固定的源寄存器时隙位置接收一个操作数,其中固定的时隙位置对于每个ALU可以是不同的。 操作数路由可以从任何源寄存器时隙位置提供另一个操作数,用于另一个输入到每个相应的ALU。

    Computer modified to perform inverse discrete cosine transform
operations on a one-dimensional matrix of numbers within a minimal
number of instruction cycles
    26.
    发明授权
    Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles 失效
    计算机被修改为对最小数量的指令周期内的数字的一维矩阵执行逆离散余弦变换操作

    公开(公告)号:US5801975A

    公开(公告)日:1998-09-01

    申请号:US759045

    申请日:1996-12-02

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地中央处理单元(CPU)总线耦合到常规处理器。 MEU采用向量寄存器,向量算术逻辑单元(ALU)和操作数路由单元(ORU),以尽可能少的指令周期执行最大数量的多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地在矢量ALU中路由和组合。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。

    Computer system with processor cache that stores remote cache presence information
    28.
    发明授权
    Computer system with processor cache that stores remote cache presence information 有权
    具有处理器缓存的计算机系统,用于存储远程缓存存在信息

    公开(公告)号:US07096323B1

    公开(公告)日:2006-08-22

    申请号:US10256970

    申请日:2002-09-27

    CPC classification number: G06F12/082 G06F2212/2515 Y02D10/13

    Abstract: A computer system with a processor cache that stores remote cache presence information. In one embodiment, a plurality of presence vectors are stored to indicate whether particular blocks of data mapped to another node are being remotely cached. Rather than storing the presence vectors in a dedicated storage, the remote cache presence vectors may be stored in designated locations of a cache memory subsystem, such as an L2 cache, associated with a processor core. For example, a designated way of the cache memory subsystem may be allocated for storing remote cache presence vectors, while the remaining ways of the cache are used to store normal processor data. New data blocks may be remotely cached in response to evictions from the cache memory subsystem. In yet a further embodiment, additional entries of the cache memory subsystem may be used for storing directory entries to filter probe command and response traffic.

    Abstract translation: 具有存储远程缓存存在信息的处理器高速缓存的计算机系统。 在一个实施例中,存储多个存在向量以指示映射到另一节点的特定数据块是否被远程高速缓存。 而不是将存在向量存储在专用存储器中,远程高速缓存存在向量可以存储在与处理器核心相关联的高速缓冲存储器子系统(例如L 2高速缓存)的指定位置。 例如,缓存存储器子系统的指定方式可被分配用于存储远程高速缓存存在向量,而高速缓存的剩余方式用于存储正常的处理器数据。 响应于来自高速缓冲存储器子系统的驱逐,可以远程高速缓存新的数据块。 在又一个实施例中,高速缓存存储器子系统的附加条目可以用于存储目录条目以过滤探测命令和响应流量。

    Method and apparatus for calculating a power of an operand
    29.
    发明授权
    Method and apparatus for calculating a power of an operand 有权
    用于计算操作数的功率的方法和装置

    公开(公告)号:US06381625B2

    公开(公告)日:2002-04-30

    申请号:US09782474

    申请日:2001-02-12

    Abstract: A multiplier capable of performing signed and unsigned scalar and vector multiplication is disclosed. The multiplier is configured to receive signed or unsigned multiplier and multiplicand operands in scalar or packed vector form. An effective sign for the multiplier and multiplicand operands may be calculated and used to create and select a number of partial products according to Booth's algorithm. Once the partial products have been created and selected, they may be summed and the results may be output. The results may be signed or unsigned, and may represent vector or scalar quantities. When a vector multiplication is performed, the multiplier may be configured to generate and select partial products so as to effectively isolate the multiplication process for each pair of vector components. The multiplier may also be configured to sum the products of the vector components to form the vector dot product. The final product may be output in segments so as to require fewer bus lines. The segments may be rounded by adding a rounding constant. Rounding and normalization may be performed in two paths, one assuming an overflow will occur, the other assuming no overflow will occur. The multiplier may also be configured to perform iterative calculations to evaluate constant powers of an operand. Intermediate products that are formed may be rounded and normalized in two paths and then compressed and stored for use in the next iteration. An adjustment constant may also be added to increase the frequency of exactly rounded results.

    Abstract translation: 公开了能够执行有符号和无符号标量和矢量乘法的乘法器。 乘法器配置为以标量或压缩向量形式接收带符号或无符号乘数和被乘数操作数。 可以计算乘数和被乘数操作数的有效符号,并用于根据布斯算法创建和选择多个部分乘积。 一旦创建并选择了部分产品,就可以对它们进行求和并输出结果。 结果可能是有符号或无符号的,可能表示向量或标量。 当执行向量乘法时,乘法器可以被配置为产生和选择部分乘积,以便有效地隔离每对向量分量的乘法过程。 乘法器还可以被配置为对矢量分量的乘积求和以形成向量点积。 最终产品可以分段输出,以便需要更少的总线。 可以通过添加舍入常数来对段进行舍入。 可以在两个路径中执行舍入和归一化,一个假设将发生溢出,另一个假设不会发生溢出。 乘法器还可以被配置为执行迭代计算以评估操作数的恒定功率。 形成的中间产品可以在两个路径中进行圆化和归一化,然后压缩并存储以用于下一次迭代。 还可以添加调整常数以增加精确舍入结果的频率。

    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria
    30.
    发明授权
    Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria 有权
    根据标准优化多管道可执行和特定管道可执行指令的分配到执行管道

    公开(公告)号:US06370637B1

    公开(公告)日:2002-04-09

    申请号:US09370789

    申请日:1999-08-05

    Abstract: A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.

    Abstract translation: 公开了一种具有配置成有效地分配多流水线可执行指令的浮点单元的微处理器。 多管道可执行指令是不强制在特定类型执行管道中执行的指令。 例如,垃圾操作是多管道可执行的。 垃圾操作是在浮点单元的流水线的早期执行的指令(例如,在寄存器重命名期间),但是仍然通过用于异常检查的执行管线。 垃圾操作不限于特定的执行管道,而是可以通过浮点单元中的任何一个微处理器的执行流水线。 使用许多不同的标准,在每个时钟周期的基础上分配多流水线可执行指令。 例如,分配可以根据浮点单元在单个时钟周期中接收的多流水线可执行指令的数量而变化。

Patent Agency Ranking