Sectored least-recently-used cache replacement
    1.
    发明授权
    Sectored least-recently-used cache replacement 有权
    构造最近最近使用的缓存替换

    公开(公告)号:US06823427B1

    公开(公告)日:2004-11-23

    申请号:US09859271

    申请日:2001-05-16

    IPC分类号: G06F1200

    CPC分类号: G06F12/128 G06F12/0864

    摘要: Various methods and systems for implementing a sectored least recently used (LRU) cache replacement algorithm are disclosed. Each set in an N-way set-associative cache is partitioned into several sectors that each include two or more of the N ways. Usage status indicators such as pointers show the relative usage status of the sectors in an associated set. For example, an LRU pointer may point to the LRU sector, an MRU pointer may point to the MRU sector, and so on. When a replacement is performed, a way within the LRU sector identified by the LRU pointer is filled.

    摘要翻译: 公开了用于实现最近最少使用的(LRU)高速缓存替换算法的各种方法和系统。 N路组合关联高速缓存中的每个集合被划分成几个扇区,每个扇区包括N个方式中的两个或更多个。 使用状态指示器(如指针)显示相关组中扇区的相对使用状态。 例如,LRU指针可以指向LRU扇区,MRU指针可以指向MRU扇区,依此类推。 当执行替换时,由LRU指针识别的LRU扇区内的方式被填充。

    Method and system for speculatively invalidating lines in a cache
    2.
    发明授权
    Method and system for speculatively invalidating lines in a cache 有权
    在缓存中推测使无效行的方法和系统

    公开(公告)号:US06725337B1

    公开(公告)日:2004-04-20

    申请号:US09859290

    申请日:2001-05-16

    IPC分类号: G06F1200

    CPC分类号: G06F12/0891

    摘要: A cache controller configured to speculatively invalidate a cache line may respond to an invalidating request or instruction immediately instead of waiting for error checking to complete. In case the error checking determines that the invalidation is erroneous and thus should not be performed, the cache controller protects the speculatively invalidated cache line from modification until error checking is complete. This way, if the invalidation is later found to be erroneous, the speculative invalidation can be reversed. If error checking completes without detecting any errors, the speculative invalidation becomes non-speculative.

    摘要翻译: 配置为推测无效高速缓存行的高速缓存控制器可以立即响应无效请求或指令,而不是等待错误检查完成。 如果错误检查确定无效是错误的,因此不应该执行,则缓存控制器保护推测无效的高速缓存行不被修改,直到错误检查完成。 这样,如果后来发现无效是错误的,则可以颠倒推测无效。 如果错误检查完成而没有检测到任何错误,则推测无效将成为非投机性的。

    Stride based prefetcher with confidence counter and dynamic prefetch-ahead mechanism
    3.
    发明授权
    Stride based prefetcher with confidence counter and dynamic prefetch-ahead mechanism 有权
    基于Stride的预取器,具有置信度计数器和动态预取提前机制

    公开(公告)号:US06571318B1

    公开(公告)日:2003-05-27

    申请号:US09798469

    申请日:2001-03-02

    IPC分类号: G06F1200

    CPC分类号: G06F12/0862 G06F2212/6026

    摘要: A processor is described which includes a stride detect table. The stride detect table includes one or more entries, each entry used to track a potential stride pattern. Additionally, each entry includes a confidence counter. The confidence counter may be incremented each time another address in the pattern is detected, and thus may be indicative of the strength of the pattern (e.g., the likelihood of the pattern repeating). At a first threshold of the confidence counter, prefetching of the next address in the pattern (the most recent address plus the stride) may be initiated. At a second, greater threshold, a more aggressive prefetching may be initiated (e.g. the most recent address plus twice the stride). In some implementations, the prefetch mechanism including the stride detect table may replace a prefetch buffer and prefetch logic in the memory controller.

    摘要翻译: 描述了包括步幅检测表的处理器。 步幅检测表包括一个或多个条目,每个条目用于跟踪潜在的步幅图案。 另外,每个条目都包含一个置信计数器。 每次检测到图案中的另一个地址时,置信度计数器可以递增,因此可以指示图案的强度(例如,图案重复的可能性)。 在置信计数器的第一阈值处,可以启动模式中的下一个地址(最近的地址加大步)的预取。 在第二个更大的阈值下,可以启动更积极的预取(例如,最近的地址加上步幅的两倍)。 在一些实现中,包括步幅检测表的预取机制可以替代存储器控制器中的预取缓冲器和预取逻辑。

    System and method for scheduling operations using speculative data operands
    4.
    发明授权
    System and method for scheduling operations using speculative data operands 有权
    使用推测数据操作数调度操作的系统和方法

    公开(公告)号:US07937569B1

    公开(公告)日:2011-05-03

    申请号:US10839471

    申请日:2004-05-05

    IPC分类号: G06F9/30

    摘要: A system and method for scheduling operations using speculative data operands. In one embodiment, a system may include a scheduler configured to store a speculative source tag and a non-speculative source tag for an operand of an operation and an execution core configured to execute operations issued by the scheduler and to output result tags identifying operands generated by executing the operations. The scheduler may be configured to determine whether the operation is ready to issue by comparing the speculative source tag, but not the non-speculative source tag, to the result tags output by the execution core unless an incorrect speculation has been detected. If an incorrect speculation has been detected, the scheduler may be configured to determine whether the operation is ready to issue by comparing the non-speculative source tag, but not the speculative source tag, to the result tags output by the execution core.

    摘要翻译: 一种用于使用推测数据操作数调度操作的系统和方法。 在一个实施例中,系统可以包括调度器,其被配置为存储用于操作的操作数的推测源标签和非推测性源标签,以及被配置为执行由调度器发出的操作的执行核心,并输出标识所生成的操作数的结果标签 通过执行操作。 调度器可以被配置为通过将推测源标签而不是非推测性源标签与执行核心输出的结果标签进行比较来确定操作是否准备好发布,除非检测到不正确的猜测。 如果检测到不正确的猜测,则调度器可以被配置为通过将非推测性源标签而不是推测源标签与执行核心输出的结果标签进行比较来确定操作是否准备发布。

    Apparatus and method for port arbitration in a register file on the basis of functional unit issue slots
    5.
    发明授权
    Apparatus and method for port arbitration in a register file on the basis of functional unit issue slots 有权
    基于功能单元发布槽在寄存器文件中进行端口仲裁的装置和方法

    公开(公告)号:US07315935B1

    公开(公告)日:2008-01-01

    申请号:US10679745

    申请日:2003-10-06

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: A microprocessor is configured to provide port arbitration in a register file. The microprocessor includes a plurality of functional units configured to collectively operate on a maximum number of operands in a given execution cycle, and a register file providing a number of read ports that is insufficient to provide the maximum number of operands to the plurality of functional units in the given execution cycle. The microprocessor also includes an arbitration logic coupled to allocate the read ports of the register file for use by selected functional units during the given execution cycle.

    摘要翻译: 微处理器被配置为在寄存器文件中提供端口仲裁。 微处理器包括多个功能单元,其被配置为在给定执行周期中对最大数目的操作数进行集中操作;以及寄存器文件,其提供不足以向多个功能单元提供最大数目的操作数的多个读取端口 在给定的执行周期。 微处理器还包括一个仲裁逻辑,它被耦合以分配该寄存器文件的读取端口,供所选功能单元在给定的执行周期中使用。

    Apparatus and method for implementing a least recently used cache replacement algorithm
    6.
    发明授权
    Apparatus and method for implementing a least recently used cache replacement algorithm 失效
    用于实现最近最少使用的高速缓存替换算法的装置和方法

    公开(公告)号:US06408364B1

    公开(公告)日:2002-06-18

    申请号:US09528041

    申请日:2000-03-17

    IPC分类号: G06F1200

    CPC分类号: G06F12/123

    摘要: A least recently used (LRU) cache replacement algorithm is implemented with a set of N pointer registers that point to respective ways of an N-way set of memory blocks. One of the pointer registers is an LRU pointer, pointing to a least recently used way and another of the pointer registers is a most recently used (MRU) pointer, pointing to a most recently used way. For a cache fill operation in which a new memory block is written to one of the N ways, the new memory block is written into the way (wayn), pointed to by the LRU pointer. All the pointers except the MRU pointer are promoted to point to a way pointed to by respective newer neighboring pointers, the newer neighboring pointers being neighbors towards the MRU pointer. The MRU pointer is updated to point to the wayn in which the new memory block was written. For a cache hit in which one of the memory blocks in the set, waym, is accessed for a write or read operation, all the pointers waym and newer, except for the MRU pointer, are promoted to point to a way pointed to by a newer neighboring pointer. The MRU pointer is changed to point to waym. For an invalidate operation in which one of the ways, wayk is invalidated, all the pointers pointing to wayk and older are demoted, except for the LRU pointer. The LRU pointer is pointed to the invalidated way.

    摘要翻译: 使用指向N路存储器块集合的各个方式的一组N个指针寄存器来实现最近最少使用(LRU)高速缓存替换算法。 指针寄存器之一是LRU指针,指向最近最少使用的方式,另一个指针寄存器是最近使用的(MRU)指针,指向最近使用的方式。 对于其中将新的存储器块写入N个路径中的一个的高速缓存填充操作,新的存储器块被写入由LRU指针指向的方式(wayn)。 除了MRU指针之外的所有指针被提升以指向相应较新的相邻指针指向的方式,较新的相邻指针是朝向MRU指针的邻居。 更新MRU指针以指向写入新内存块的方式。 对于缓存命中,其中访问集合(waym)中的一个存储器块用于写入或读取操作,除了MRU指针之外,所有指向waym和更新的指针都被提升为指向一个 较新的相邻指针 MRU指针更改为指向waym。 对于无效操作,其中一种方式,wayk无效,除了LRU指针之外,所有指向wayk和old的指针都将被降级。 LRU指针指向无效的方式。

    PAIRED EXECUTION SCHEDULING OF DEPENDENT MICRO-OPERATIONS
    7.
    发明申请
    PAIRED EXECUTION SCHEDULING OF DEPENDENT MICRO-OPERATIONS 审中-公开
    配对执行依赖性微操作

    公开(公告)号:US20120023314A1

    公开(公告)日:2012-01-26

    申请号:US12840835

    申请日:2010-07-21

    IPC分类号: G06F9/30 G06F9/38

    CPC分类号: G06F9/3838 G06F9/3826

    摘要: A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.

    摘要翻译: 一种用于减少处理器内的多周期调度器的等待时间的方法和机制。 处理器包括前端流水线,其在调度管道级之前确定指令之间的数据依赖性。 对于每个数据依赖性,基于较年轻的依赖指令从相应的较旧(在程序顺序)指令中定位的指令的数量来确定距离值。 当在多循环调度器中分配较年轻的依赖指令时,该距离值可以用于定位存储在调度器中的旧指令的条目。 当较老的指令被挑选出来时,年龄较大的指令被标记为预选。 在随后的时钟周期中,可以挑选较年轻的依赖指令以进行发布,从而减少多周期调度器的等待时间。

    Scheduler for use in a microprocessor that supports data-speculative execution
    8.
    发明授权
    Scheduler for use in a microprocessor that supports data-speculative execution 有权
    调度器用于支持数据推测执行的微处理器

    公开(公告)号:US06950925B1

    公开(公告)日:2005-09-27

    申请号:US10229563

    申请日:2002-08-28

    IPC分类号: G06F9/38 G06F9/30

    CPC分类号: G06F9/3842

    摘要: A microprocessor may include several execution units and a scheduler coupled to issue operations to at least one of the execution units. The scheduler may include several entries. A first entry may be allocated to a first operation. The first entry includes a source status indication for each of the first operation's operands. Each source status indication indicates whether a value of a respective one of the first operation's operands is speculative. The scheduler is configured to update one of the first entry's source status indications to indicate that a value of a respective one of the first operation's operands is non-speculative in response to receiving an indication that a value of a result of a second operation is non-speculative.

    摘要翻译: 微处理器可以包括若干个执行单元和一个调度器,它被耦合以向至少一个执行单元发出操作。 调度器可以包括几个条目。 可以将第一条目分配给第一操作。 第一个条目包括每个第一个操作的操作数的源状态指示。 每个源状态指示指示第一操作的操作数中的相应一个的值是否是推测性的。 调度器被配置为响应于接收到第二操作的结果的值不是的指示,更新第一条目的源状态指示之一以指示第一操作的操作数中的相应一个的值是非推测性的 特别的

    Store aware prefetching for a datastream
    9.
    发明授权
    Store aware prefetching for a datastream 有权
    存储感知预取数据流

    公开(公告)号:US08667225B2

    公开(公告)日:2014-03-04

    申请号:US12558465

    申请日:2009-09-11

    IPC分类号: G06F12/00

    摘要: A system and method for efficient data prefetching. A data stream stored in lower-level memory comprises a contiguous block of data used in a computer program. A prefetch unit in a processor detects a data stream by identifying a sequence of storage accesses referencing a contiguous blocks of data in a monotonically increasing or decreasing manner. After a predetermined training period for a given data stream, the prefetch unit prefetches a portion of the given data stream from memory without write permission, in response to an access that does not request write permission. Also, after the training period, the prefetch unit prefetches a portion of the given data stream from lower-level memory with write permission, in response to determining there has been a prior access to the given data stream that requests write permission subsequent to a number of cache misses reaching a predetermined threshold.

    摘要翻译: 一种用于高效数据预取的系统和方法。 存储在下级存储器中的数据流包括在计算机程序中使用的连续的数据块。 处理器中的预取单元通过以单调递增或递减的方式识别参考连续数据块的存储访问序列来检测数据流。 在针对给定数据流的预定训练周期之后,响应于不请求写许可的访问,预取单元从存储器中预取给定数据流的一部分而不具有写许可。 此外,在训练期之后,预取单元响应于确定先前访问给定数据流的请求后的写入权限,从而从具有写许可的下级存储器中预取给定数据流的一部分 的高速缓存未命中达到预定阈值。

    System and method to prevent in-flight instances of operations from disrupting operation replay within a data-speculative microprocessor
    10.
    发明授权
    System and method to prevent in-flight instances of operations from disrupting operation replay within a data-speculative microprocessor 有权
    用于防止操作中的飞行中实例在数据推测微处理器内中断操作重放的系统和方法

    公开(公告)号:US07363470B2

    公开(公告)日:2008-04-22

    申请号:US10429082

    申请日:2003-05-02

    IPC分类号: G06F9/30

    摘要: A microprocessor may include one or more functional units configured to execute operations, a scheduler configured to issue operations to the functional units for execution, and at least one replay detection unit. The scheduler may be configured to maintain state information for each operation. Such state information may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler should be replayed. If an instance of that operation is currently being executed by one of the functional units when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems may include such a microprocessor.

    摘要翻译: 微处理器可以包括被配置为执行操作的一个或多个功能单元,被配置为向功能单元发布操作的调度器以及至少一个重放检测单元。 调度器可以被配置为维护每个操作的状态信息。 这样的状态信息可以指示关联的操作是否已经完成执行。 重放检测单元可以被配置为检测应该重播调度器中的一个操作。 如果当检测到需要重播的操作时,当该功能单元的一个当前正在执行该操作的实例时,重放检测单元被配置为响应于执行该操作而禁止对该操作的状态信息的更新 -flight实例的操作。 计算机系统的各种实施例可以包括这样的微处理器。