摘要:
Various methods and systems for implementing a sectored least recently used (LRU) cache replacement algorithm are disclosed. Each set in an N-way set-associative cache is partitioned into several sectors that each include two or more of the N ways. Usage status indicators such as pointers show the relative usage status of the sectors in an associated set. For example, an LRU pointer may point to the LRU sector, an MRU pointer may point to the MRU sector, and so on. When a replacement is performed, a way within the LRU sector identified by the LRU pointer is filled.
摘要:
A cache controller configured to speculatively invalidate a cache line may respond to an invalidating request or instruction immediately instead of waiting for error checking to complete. In case the error checking determines that the invalidation is erroneous and thus should not be performed, the cache controller protects the speculatively invalidated cache line from modification until error checking is complete. This way, if the invalidation is later found to be erroneous, the speculative invalidation can be reversed. If error checking completes without detecting any errors, the speculative invalidation becomes non-speculative.
摘要:
A processor is described which includes a stride detect table. The stride detect table includes one or more entries, each entry used to track a potential stride pattern. Additionally, each entry includes a confidence counter. The confidence counter may be incremented each time another address in the pattern is detected, and thus may be indicative of the strength of the pattern (e.g., the likelihood of the pattern repeating). At a first threshold of the confidence counter, prefetching of the next address in the pattern (the most recent address plus the stride) may be initiated. At a second, greater threshold, a more aggressive prefetching may be initiated (e.g. the most recent address plus twice the stride). In some implementations, the prefetch mechanism including the stride detect table may replace a prefetch buffer and prefetch logic in the memory controller.
摘要:
A system and method for scheduling operations using speculative data operands. In one embodiment, a system may include a scheduler configured to store a speculative source tag and a non-speculative source tag for an operand of an operation and an execution core configured to execute operations issued by the scheduler and to output result tags identifying operands generated by executing the operations. The scheduler may be configured to determine whether the operation is ready to issue by comparing the speculative source tag, but not the non-speculative source tag, to the result tags output by the execution core unless an incorrect speculation has been detected. If an incorrect speculation has been detected, the scheduler may be configured to determine whether the operation is ready to issue by comparing the non-speculative source tag, but not the speculative source tag, to the result tags output by the execution core.
摘要:
A microprocessor is configured to provide port arbitration in a register file. The microprocessor includes a plurality of functional units configured to collectively operate on a maximum number of operands in a given execution cycle, and a register file providing a number of read ports that is insufficient to provide the maximum number of operands to the plurality of functional units in the given execution cycle. The microprocessor also includes an arbitration logic coupled to allocate the read ports of the register file for use by selected functional units during the given execution cycle.
摘要:
A least recently used (LRU) cache replacement algorithm is implemented with a set of N pointer registers that point to respective ways of an N-way set of memory blocks. One of the pointer registers is an LRU pointer, pointing to a least recently used way and another of the pointer registers is a most recently used (MRU) pointer, pointing to a most recently used way. For a cache fill operation in which a new memory block is written to one of the N ways, the new memory block is written into the way (wayn), pointed to by the LRU pointer. All the pointers except the MRU pointer are promoted to point to a way pointed to by respective newer neighboring pointers, the newer neighboring pointers being neighbors towards the MRU pointer. The MRU pointer is updated to point to the wayn in which the new memory block was written. For a cache hit in which one of the memory blocks in the set, waym, is accessed for a write or read operation, all the pointers waym and newer, except for the MRU pointer, are promoted to point to a way pointed to by a newer neighboring pointer. The MRU pointer is changed to point to waym. For an invalidate operation in which one of the ways, wayk is invalidated, all the pointers pointing to wayk and older are demoted, except for the LRU pointer. The LRU pointer is pointed to the invalidated way.
摘要:
A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.
摘要:
A microprocessor may include several execution units and a scheduler coupled to issue operations to at least one of the execution units. The scheduler may include several entries. A first entry may be allocated to a first operation. The first entry includes a source status indication for each of the first operation's operands. Each source status indication indicates whether a value of a respective one of the first operation's operands is speculative. The scheduler is configured to update one of the first entry's source status indications to indicate that a value of a respective one of the first operation's operands is non-speculative in response to receiving an indication that a value of a result of a second operation is non-speculative.
摘要:
A system and method for efficient data prefetching. A data stream stored in lower-level memory comprises a contiguous block of data used in a computer program. A prefetch unit in a processor detects a data stream by identifying a sequence of storage accesses referencing a contiguous blocks of data in a monotonically increasing or decreasing manner. After a predetermined training period for a given data stream, the prefetch unit prefetches a portion of the given data stream from memory without write permission, in response to an access that does not request write permission. Also, after the training period, the prefetch unit prefetches a portion of the given data stream from lower-level memory with write permission, in response to determining there has been a prior access to the given data stream that requests write permission subsequent to a number of cache misses reaching a predetermined threshold.
摘要:
A microprocessor may include one or more functional units configured to execute operations, a scheduler configured to issue operations to the functional units for execution, and at least one replay detection unit. The scheduler may be configured to maintain state information for each operation. Such state information may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler should be replayed. If an instance of that operation is currently being executed by one of the functional units when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems may include such a microprocessor.