Abstract:
Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.
Abstract:
A processor includes a mechanism that checks for and flushes only speculative loads and any respective dependent instructions that are younger than an executed wait for event (WEV) instruction, and which also match an address of a store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. The mechanism may allow speculative loads that do not match the address of any store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor.
Abstract:
Techniques are disclosed relating to ordering of load instructions in a weakly-ordered memory model. In one embodiment, a processor includes a cache with multiple cache lines and a store queue configured to maintain status information associated with a store instruction that targets a location in one of the cache lines. In this embodiment, the processor is configured to set an indicator in the status information in response to migration of the targeted cache line. The indicator may be usable to sequence performance of load instructions that are younger than the store instruction. For example, the processor may be configured to wait, based on the indicator, to perform a younger load instruction that targets the same location as the store instruction until the store instruction is removed from the store queue. This may prevent forwarding of the value of the store instruction to the younger load and preserve load-load ordering.
Abstract:
Systems, methods, and apparatuses for reducing writes to the data array of a cache. A cache hierarchy includes one or more L1 caches and a L2 cache inclusive of the L2 cache(s). When a request from the L1 cache misses in the L2 cache, the L2 cache sends a fill request to memory. When the fill data returns from memory, the L2 cache delays writing the fill data to its data array. Instead, this cache line is written to the L1 cache and a clean-evict bit corresponding to the cache line is set in the L1 cache. When the L1 cache evicts this cache line, the L1 cache will write back the cache line to the L2 cache even if the cache line has not been modified.
Abstract:
Systems, processors, and methods for keeping uncacheable data coherent. A processor includes a multi-level cache hierarchy, and uncacheable load memory operations can be cached at any level of the cache hierarchy. If an uncacheable load misses in the L2 cache, then allocation of the uncacheable load will be restricted to a subset of the ways of the L2 cache. If an uncacheable store memory operation hits in the L1 cache, then the hit cache line can be updated with the data from the memory operation. If the uncacheable store misses in the L1 cache, then the uncacheable store is sent to a core interface unit.Multiple contiguous store misses are merged into larger blocks of data in the core interface unit before being sent to the L2 cache.