摘要:
According to a method of data processing, a memory controller receives a plurality of data prefetch requests from multiple processor cores in the data processing system, where the plurality of prefetch load requests include a data prefetch request issued by a particular processor core among the multiple processor cores. In response to receipt of the data prefetch request, the memory controller provides a coherency response indicating an excess number of data prefetch requests. In response to the coherency response, the particular processor core reduces a rate of issuance of data prefetch requests.
摘要:
In response to a memory access request of a processor core that targets a target cache line, the lower level cache of a vertical cache hierarchy associated with the processor core supplies a copy of the target cache line to an upper level cache in the vertical cache hierarchy and retains a copy in a shared coherence state. The upper level cache holds the copy of the target cache line in a private shared ownership coherence state indicating that each cached copy of the target memory block is cached within the vertical cache hierarchy associated with the processor core. In response to the upper level cache signaling replacement of the copy of the target cache line in the private shared ownership coherence state, the lower level cache updates its copy of the target cache line to the exclusive ownership coherence state without coherency messaging with other vertical cache hierarchies.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
Multiple write buffers are provided within each memory module and are utilized to buffer multiple received write data forwarded to the chip via a write-to-buffer data operation. When a write is received at the memory controller, the memory controller first issues the write-to-buffer (data) operation and the data is forwarded to one of the write buffers. Multiple writes targeting the same DIMM are thus buffered. When all of the available buffers at a memory module are full, the memory controller issues the set of address only write commands to the memory module. The control logic of the DIMM streams all of the buffered write data to the memory device(s) in one continuous burst. By buffering multiple writes and then writing all buffered write data within the DIMM in a single burst, the write-to-read turnaround penalty of the memory module's data bus is substantially minimized.
摘要:
In response to a memory access request of a processor core that targets a target cache line, the lower level cache of a vertical cache hierarchy associated with the processor core supplies a copy of the target cache line to an upper level cache in the vertical cache hierarchy and retains a copy in a shared coherence state. The upper level cache holds the copy of the target cache line in a private shared ownership coherence state indicating that each cached copy of the target memory block is cached within the vertical cache hierarchy associated with the processor core. In response to the upper level cache signaling replacement of the copy of the target cache line in the private shared ownership coherence state, the lower level cache updates its copy of the target cache line to the exclusive ownership coherence state without coherency messaging with other vertical cache hierarchies.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A set of N copies of bank control logic are provided for tracking the banks within the memory modules (DRAMS). When the total number of banks within the memory module(s) is greater than N, the addresses of particular banks are folded into a single grouping. The banks are represented by the N copies of the bank control logic even when the total number of banks is greater than N. Each bank within the group is tagged as being busy when any one of the banks in the group is the target of a memory access request. The algorithm folds the addresses of the banks in an order that substantially minimizes the likelihood that a bank that is in a busy or false busy state will be the target of another memory access request. Power and logic savings are recognized as only N copies of bank control logic have to be supported.
摘要:
A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.
摘要:
According to a method of data processing, a memory controller receives a plurality of data prefetch requests from multiple processor cores in the data processing system, where the plurality of prefetch load requests include a data prefetch request issued by a particular processor core among the multiple processor cores. In response to receipt of the data prefetch request, the memory controller provides a coherency response indicating an excess number of data prefetch requests. In response to the coherency response, the particular processor core reduces a rate of issuance of data prefetch requests.
摘要:
A cache coherent data processing system includes a plurality of processing units each having at least an associated cache, a system memory, and a memory controller that is coupled to and controls access to the system memory. The system memory includes a plurality of storage locations for storing a memory block of data, where each of the plurality of storage locations is sized to store a sub-block of data. The system memory further includes metadata storage for storing metadata, such as a domain indicator, describing the memory block. In response to a failure of a storage location for a particular sub-block among the plurality of sub-blocks, the memory controller overwrites at least a portion of the metadata in the metadata storage with the particular sub-block of data.