摘要:
A processor includes a processing unit, a cache memory, and a central request queue. The central request queue is operable to receive a prefetch load request for a cache line to be loaded into the cache memory, receive a demand load request for the cache line from the processing unit, merge the prefetch load request and the demand load request to generate a promoted load request specifying the processing unit as a requestor, receive the cache line associated with the promoted load request, and forward the cache line to the processing unit.
摘要:
Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
摘要:
Systems and methods for coordinated memory-side cache prefetching and dynamic interleaving configuration modification involve modifying one or both of the prefetch distance or the prefetch degree used by prefetcher modules of one or more memory-side caches by modifying interleaving configuration data following detection of an interleaving reconfiguration trigger condition indicative, for example, of low prefetch accuracy, low prefetch coverage, high prefetch lateness, or a combination of these. In response an interleaving reconfiguration trigger condition, a processor modifies the interleaving configuration data for the processing system based on the prefetch performance characteristics associated with the interleaving reconfiguration trigger condition. In some embodiments, the interleaving configuration data is modified by changing which physical memory address indices are used to determine the bits that define the channel identification number to which that physical memory address is to be mapped.
摘要:
A processor includes a cache memory, a first core including an instruction execution unit, and a memory bus coupling the cache memory to the first core. The memory bus is operable to receive a first portion of a cache line of data for the cache memory, the first core is operable to identify a plurality of data requests targeting the cache line and the first portion and select one of the identified plurality of data requests for execution, and the memory bus is operable to forward the first portion to the instruction execution unit and to the cache memory in parallel.
摘要:
A processor includes a cache memory, a first core including an instruction execution unit, and a memory bus coupling the cache memory to the first core. The memory bus is operable to receive a first portion of a cache line of data for the cache memory, the first core is operable to identify a plurality of data requests targeting the cache line and the first portion and select one of the identified plurality of data requests for execution, and the memory bus is operable to forward the first portion to the instruction execution unit and to the cache memory in parallel.
摘要:
The present invention provides a method and apparatus for allocating store queue entries to store instructions for early store-to-load forwarding. Some embodiments of the method include allocating an entry in a store queue to a store instruction in response to the store instruction being dispatched and prior to receiving a translation of a virtual address to a physical address associated with the store instruction. The entry includes storage for data to be written to the physical address by the store instruction.
摘要:
A processor includes a processing unit, a cache memory, and a central request queue. The central request queue is operable to receive a prefetch load request for a cache line to be loaded into the cache memory, receive a demand load request for the cache line from the processing unit, merge the prefetch load request and the demand load request to generate a promoted load request specifying the processing unit as a requestor, receive the cache line associated with the promoted load request, and forward the cache line to the processing unit.
摘要:
A method is provided for executing a cacheable store. The method includes determining whether to replay a store instruction to re-acquire one or more cache lines based upon a state of the cache line(s) and an execution phase of the store instruction. The store instruction is replayed in response to determining to replay the store instruction. An apparatus is provided that includes a store queue (SQ) configurable to determine whether to replay a store instruction to re-acquire one or more cache lines based upon a state of the cache line(s) and an execution phase of the store instruction. Computer readable storage devices for adapting a fabrication facility to manufacture the apparatus are provided.
摘要:
Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
摘要:
In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.