COMPUTER SYSTEM THAT PROVIDES ATOMICITY BY USING A TLB TO INDICATE WHETHER AN EXPORTABLE INSTRUCTION SHOULD BE EXECUTED USING CACHE COHERENCY OR BY EXPORTING THE EXPORTABLE INSTRUCTION, AND EMULATES INSTRUCTIONS SPECIFYING A BUS LOCK
    83.
    发明授权
    COMPUTER SYSTEM THAT PROVIDES ATOMICITY BY USING A TLB TO INDICATE WHETHER AN EXPORTABLE INSTRUCTION SHOULD BE EXECUTED USING CACHE COHERENCY OR BY EXPORTING THE EXPORTABLE INSTRUCTION, AND EMULATES INSTRUCTIONS SPECIFYING A BUS LOCK 有权
    使用TLB提供原子性的计算机系统可以指示使用高速缓存或通过出口指令执行出口指令,并显示指定总线锁定的指令

    公开(公告)号:US06430657B1

    公开(公告)日:2002-08-06

    申请号:US09170137

    申请日:1998-10-12

    CPC classification number: G06F12/0837 G06F12/1027

    Abstract: Atomic memory operations are provided by using exportable “fetch and add” instructions and by emulating IA-32 instructions prepended with a lock prefix. In accordance with the present invention, a CPU includes a default control register that includes IA-32 lock check enable bit (LC) that when set to “1”, causes an IA-32 atomic memory reference to raise an IA-32 intercept lock fault. An IA-32 intercept lock fault handler branches to appropriate code to atomically emulate the instruction. Furthermore, the present invention defines an exportable fetch and add (FETCHADD) instruction that reads a memory location indexed by a first register, places the contents read from the memory location into a second register, increments the value read from the memory location, and stores the sum back to the memory location. Associated with each virtual memory page is a memory attribute that can assume a state of “cacheable using a write-back policy” (WB), “uncacheable” (UC), or “uncacheable and exportable” (UCE). When a FETCHADD instruction is executed and the memory location accessed is in a page having an attribute set to WB, the FETCHADD is atomically executed by the CPU by obtaining exclusive use of the cache line containing the memory location. However, when a FETCHADD instruction is executed and the memory location accessed is in a page having an attribute set to UCE, the FETCHADD is atomically executed by exporting the FETCHADD instruction to a centralized location, such as a memory controller.

    Abstract translation: 通过使用可导出的“读取和添加”指令以及通过模拟前缀为前缀的IA-32指令来提供原子存储器操作。 根据本发明,CPU包括默认控制寄存器,其包括当设置为“1”时的IA-32锁定检查使能位(LC),导致IA-32原子存储器引用来提升IA-32拦截锁 故障。 IA-32拦截锁定错误处理器分支到适当的代码以原子地模拟指令。 此外,本发明定义了一种读出由第一寄存器索引的存储器位置的可导出的读取和加法(FETCHADD)指令,将从存储器位置读取的内容放入第二寄存器,增加从存储器位置读取的值,并存储 总和回到内存位置。 与每个虚拟内存页面相关联的是一种内存属性,可以采用“可缓存使用回退策略”(WB),“不可缓存”(UC)或“不可缓存和可导出”(UCE))状态。 当执行FETCHADD指令并且访问的存储器位置在具有设置为WB的属性的页面中时,FETCHADD由CPU通过获得包含存储器位置的高速缓存行的排他使用原子地执行。 然而,当执行FETCHADD指令并且访问的存储器位置在具有设置为UCE的属性的页面中时,通过将FETCHADD指令导出到诸如存储器控制器的集中位置来原子地执行FETCHADD。

    Method of sorting signed numbers and solving absolute differences using
packed instructions
    84.
    发明授权
    Method of sorting signed numbers and solving absolute differences using packed instructions 失效
    使用打包指令排序有符号数字和求解绝对差异的方法

    公开(公告)号:US6036350A

    公开(公告)日:2000-03-14

    申请号:US859013

    申请日:1997-05-20

    CPC classification number: G06F7/544 G06F2207/3828 G06F2207/5442

    Abstract: A technique for sorting packed signed numbers of two operands into maxima and minima operands and solving absolute differences for each pair of corresponding values of maxima and minima. After packing two source operands with a plurality of data elements containing signed values, a greater-than comparison operation is performed on each pair of corresponding numbers in the two operands to determine which is greater. An exclusive-OR mask is generated for use in swapping those values which need to be rearranged so that all maxima are in one operand and all minima are in another operand. Once the sorting of maxima and minima is complete, a packed subtraction operation is then performed by subtracting the minima from corresponding maxima to obtain absolute differences.

    Abstract translation: 一种用于将两个操作数的有符号数目的最大值和最小值操作数进行排序并解决每一对最大值和最小值对应值的绝对差异的技术。 在包含具有包含有符号值的多个数据元素的两个源操作数之后,对两个操作数中的每对相应数字执行大于比较的操作,以确定哪个更大。 生成异或掩码以用于交换需要重新排列的那些值,使得所有最大值都在一个操作数中,并且所有最小值都在另一个操作数中。 一旦最大值和最小值的分类完成,则通过从相应的最大值中减去最小值来进行压缩减法运算,以获得绝对差。

    Microarchitecture for implementing an instruction to clear the tags of a
stack reference register file
    85.
    发明授权
    Microarchitecture for implementing an instruction to clear the tags of a stack reference register file 失效
    用于实现清除堆栈引用寄存器文件标签的指令的微体系结构

    公开(公告)号:US5857096A

    公开(公告)日:1999-01-05

    申请号:US575686

    申请日:1995-12-19

    Abstract: An apparatus (e.g. a microarchitecture of a microprocessor) comprising a plurality of tags associated with a first storage area indicating that locations in the first storage area are either empty or non-empty responsive to execution of floating point instructions which modify data contained in the first storage area. A first circuit is coupled to the plurality of tags which sets only the plurality of tags to an empty state responsive to receipt of a first instruction. The first instruction indicates termination of execution of instructions which operate upon the packed data stored in the first storage area. The apparatus further comprises a second circuit coupled to the plurality of tags for setting the plurality of tags to a non-empty state responsive to receipt of a second instruction (or instructions). The second instruction specifies an operation upon packed data stored in the first storage area. The second circuit further sets the plurality of tags to indicate execution of instructions which operate upon the packed data. This apparatus advantageously provides a architecture (e.g. a microarchitecture for a microprocessor) for clearing the packed data state at the end of executed blocks of packed data instructions to leave the floating point state in a clear condition for subsequent operations (e.g. blocks of executed floating point instructions).

    Abstract translation: 包括与第一存储区域相关联的多个标签的装置(例如微处理器的微架构),其指示第一存储区域中的位置是响应于执行浮点指令而为空或非空的,所述浮点指令修改包含在第一存储区域中的数据 储藏区域。 第一电路耦合到响应于接收到第一指令而仅将多个标签设置为空状态的多个标签。 第一指令指示终止对存储在第一存储区域中的打包数据进行操作的指令的执行。 该装置还包括耦合到多个标签的第二电路,用于响应于接收到第二指令(或指令)将多个标签设置为非空状态。 第二指令指定存储在第一存储区域中的压缩数据的操作。 第二电路还设置多个标签以指示对打包数据进行操作的指令的执行。 该装置有利地提供了一种架构(例如,微处理器的微架构),用于在执行的打包数据指令块的结尾处清除打包数据状态,以使浮点状态处于用于后续操作的清晰状态(例如执行浮点的块 说明)。

    Method and apparatus for performing multiply-subtract operations on
packed data
    88.
    发明授权
    Method and apparatus for performing multiply-subtract operations on packed data 失效
    对打包数据进行乘法减法运算的方法和装置

    公开(公告)号:US5721892A

    公开(公告)日:1998-02-24

    申请号:US554625

    申请日:1995-11-06

    CPC classification number: G06F9/30036 G06F7/5443 G06F2207/3828

    Abstract: A method and apparatus for including in a processor instructions for performing multiply-subtract operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least one of the data elements in this third packed data storing the result of performing a multiply-subtract operation on data elements in the first and second packed data.

    Abstract translation: 一种用于在处理器中包括用于对压缩数据进行乘法减法操作的指令的方法和装置。 在一个实施例中,处理器耦合到存储器。 存储器中存储有第一打包数据和第二打包数据。 处理器对所述第一打包数据和所述第二打包数据中的数据元素执行操作,以响应于接收到指令而产生第三打包数据。 该第三打包数据中的至少一个数据元素存储对第一和第二打包数据中的数据元素进行乘法运算的结果。

    Performance throttling to reduce IC power consumption
    89.
    发明授权
    Performance throttling to reduce IC power consumption 失效
    性能节流以降低IC功耗

    公开(公告)号:US5719800A

    公开(公告)日:1998-02-17

    申请号:US497853

    申请日:1995-06-30

    Abstract: The power consumed within an integrated circuit (IC) is reduced without substantial impact on its performance for typical applications by throttling the performance of particular functional units within the IC. Artificial worst-case power consumption is reduced by throttling down the activity levels of long-duration sequences of high-power operations. The recent utilization levels of particular functional units within an IC are monitored--for example, by computing each functional unit's average duty cycle over its recent operating history. If this activity level is greater than a threshold, then the functional unit is operated in a reduced-power mode. The threshold value is set large enough to allow short bursts of high utilization to occur without impacting performance. The invention allows an integrated circuit to dynamically make the tradeoff between high-speed operation and low-power operation, by throttling back performance of localized functional units when their utilization exceeds a sustainable level. Additionally, this dynamic power/speed tradeoff can be optimized across multiple functional units within an IC or among multiple ICs within a system. Additionally, this dynamic power/speed tradeoff can be altered by providing software control over throttling parameters.

    Abstract translation: 集成电路(IC)中消耗的功率通过节流IC内的特定功能单元的性能而降低,对其典型应用的性能没有显着影响。 通过减少大功率操作的长持续时间序列的活动水平来减少人为的最坏情况下的功耗。 例如,通过计算每个功能单元在其最近运行历史上的平均占空比来监视IC内的特定功能单元的最新利用水平。 如果该活动级别大于阈值,则功能单元以降低功率模式运行。 阈值设置得足够大以允许在不影响性能的情况下发生高利用率的短脉冲串。 本发明允许集成电路在其利用率超过可持续水平时通过限制本地化功能单元的性能来动态地在高速操作和低功率操作之间进行权衡。 另外,这种动态功率/速度的折衷可以在IC内或多个系统内的多个功能单元之间进行优化。 另外,通过提供对节流参数的软件控制,可以改变动态功率/速度的权衡。

    Processor performing packed data multiplication
    90.
    发明授权
    Processor performing packed data multiplication 失效
    处理器执行打包数据乘法

    公开(公告)号:US5675526A

    公开(公告)日:1997-10-07

    申请号:US756708

    申请日:1996-11-26

    Abstract: A processor. The processor includes a decoder being coupled to receive a control signal. The control signal has a first source address, a second source address, a destination address, and an operation field. The first source address corresponds to a first location. The second source address corresponds to a second location. The destination address corresponds to a third location. The operation field indicates that a type of packed data multiply operation is to be performed. The processor further includes a circuit being coupled to the decoder. The circuit is for multiplying a first packed data being stored at the first location with a second packed data being stored at the second location. The circuit is further for communicating a corresponding result packed data to the third location.

    Abstract translation: 处理器 处理器包括被耦合以接收控制信号的解码器。 控制信号具有第一源地址,第二源地址,目的地地址和操作字段。 第一个源地址对应于第一个位置。 第二源地址对应于第二位置。 目的地址对应于第三个位置。 操作字段指示要执行一种打包数据乘法运算。 处理器还包括耦合到解码器的电路。 该电路用于将在第一位置处存储的第一打包数据与存储在第二位置处的第二打包数据相乘。 电路还用于将相应的结果打包数据传送到第三位置。

Patent Agency Ranking