Abstract:
A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.
Abstract:
Systems, methods, and apparatuses for reducing writes to the data array of a cache. A cache hierarchy includes one or more L1 caches and a L2 cache inclusive of the L2 cache(s). When a request from the L1 cache misses in the L2 cache, the L2 cache sends a fill request to memory. When the fill data returns from memory, the L2 cache delays writing the fill data to its data array. Instead, this cache line is written to the L1 cache and a clean-evict bit corresponding to the cache line is set in the L1 cache. When the L1 cache evicts this cache line, the L1 cache will write back the cache line to the L2 cache even if the cache line has not been modified.
Abstract:
In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetcher in which patterns may include wild cards for some cache blocks. The wild card may match any access for the corresponding cache block (e.g. no access, demand access, prefetch, successful prefetch, etc.). Furthermore, patterns with irregular strides and/or irregular access patterns may be included in the matching patterns and may be detected for prefetch generation. In an embodiment, the AMPM prefetcher may implement a chained access map for large streaming prefetches. If a stream is detected, the AMPM prefetcher may allocate a pair of map entries for the stream and may reuse the pair for subsequent access map regions within the stream. In some embodiments, a quality factor may be associated with each access map and may control the rate of prefetch generation.
Abstract:
Processors and methods for preventing lower level prefetch units from stalling at page boundaries. An upper level prefetch unit closest to the processor core issues a preemptive request for a translation of the next page in a given prefetch stream. The upper level prefetch unit sends the translation to the lower level prefetch units prior to the lower level prefetch units reaching the end of the current page for the given prefetch stream. When the lower level prefetch units reach the boundary of the current page, instead of stopping, these prefetch units can continue to prefetch by jumping to the next physical page number provided in the translation.
Abstract:
A circular queue implementing a scheme for prioritized reads is disclosed. In one embodiment, a circular queue (or buffer) includes a number of storage locations each configured to store a data value. A multiplexer tree is coupled between the storage locations and a read port. A priority circuit is configured to generate and provide selection signals to each multiplexer of the multiplexer tree, based on a priority scheme. Based on the states of the selection signals, one of the storage locations is coupled to the read port via the multiplexers of the multiplexer tree.
Abstract:
A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.
Abstract:
A method and apparatus for evicting cache lines from a cache memory includes receiving a request from one of a plurality of processors. The cache memory is configured to store a plurality of cache lines, and a given cache line includes an identifier indicating a processor that performed a most recent access of the given cache line. The method further includes selecting a cache line for eviction from a group of least recently used cache lines, where each cache line of the group of least recently used cache lines occupy a priority position less that a predetermined value, and then evicting the selected cache line.
Abstract:
A circular queue implementing a scheme for prioritized reads is disclosed. In one embodiment, a circular queue (or buffer) includes a number of storage locations each configured to store a data value. A multiplexer tree is coupled between the storage locations and a read port. A priority circuit is configured to generate and provide selection signals to each multiplexer of the multiplexer tree, based on a priority scheme. Based on the states of the selection signals, one of the storage locations is coupled to the read port via the multiplexers of the multiplexer tree.
Abstract:
A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.