摘要:
A data processing system includes an interconnect fabric, a protected resource having a plurality of banks each associated with a respective one of a plurality of address sets, a snooper that controls access to the resource, one or more masters that initiate requests, and interconnect logic coupled to the one or more masters and to the interconnect fabric. The interconnect logic regulates a rate of delivery to the snooper via the interconnect fabric of requests that target any one the plurality of banks of the protected resource.
摘要:
A cache coherent data processing system includes at least first and second coherency domains. In a first cache memory within the first coherency domain of the data processing system, a coherency state field associated with a storage location and an address tag is set to a first data-invalid coherency state that indicates that the address tag is valid and that the storage location does not contain valid data. In response to snooping an exclusive access operation, the exclusive access request specifying a target address matching the address tag and indicating a relative domain location of a requester that initiated the exclusive access operation, the first cache memory updates the coherency state field from the first data-invalid coherency state to a second data-invalid coherency state that indicates that the address tag is valid, that the storage location does not contain valid data, and whether a target memory block associated with the address tag is cached within the first coherency domain upon successful completion of the exclusive access operation based upon the relative location of the requestor.
摘要:
A data processing system includes at least first and second coherency domains coupled by an interconnect fabric. A memory coupled to the interconnect fabric includes an address translation table having a translation table entry utilized to translate virtual memory addresses to real memory addresses. The translation table entry also includes scope information for broadcast operations targeting addresses within a memory region associated with the translation table entry. Scope prediction logic within the first coherency domain predictively selects a scope of broadcast of an operation on an interconnect fabric of the data processing system by reference to the scope information within the address translation table entry.
摘要:
In response to a master receiving a memory access request indicating a target address, the master accesses a first cache directory of an upper level cache of a cache hierarchy. In response to the target address being associated in the first cache directory with an entry having a valid address tag and a first invalid coherency state, the master issues a request specifying the target address on an interconnect fabric without regard to a coherency state associated with the target address in a second cache directory of a lower level cache of the cache hierarchy. In response to the target address having a second invalid coherency state with respect to the first cache directory, the master issues a request specifying the target address on an interconnect fabric after determining a coherency state associated with the target address in the second cache directory of the lower level cache of the cache hierarchy.
摘要:
A method and apparatus for enabling protection of a particular member of a cache during LRU victim selection. LRU state array includes additional “protection” bits in addition to the state bits. The protection bits serve as a pointer to identify the location of the member of the congruence class that is to be protected. A protected member is not removed from the cache during standard LRU victim selection, unless that member is invalid. The protection bits are pipelined to MRU update logic, where they are used to generate an MRU vector. The particular member identified by the MRU vector (and pointer) is protected from selection as the next LRU victim, unless the member is Invalid. The make MRU operation affects only the lower level LRU state bits arranged a tree-based structure and thus only negates the selection of the protected member, without affecting LRU victim selection of the other members.
摘要:
A method and apparatus for preventing selection of Deleted (D) members as an LRU victim during LRU victim selection. During each cache access targeting the particular congruence class, the deleted cache line is identified from information in the cache directory. A location of a deleted cache line is pipelined through the cache architecture during LRU victim selection. The information is latched and then passed to MRU vector generation logic. An MRU vector is generated and passed to the MRU update logic, which is selects/tags the deleted member as a MRU member. The make MRU operation affects only the lower level LRU state bits arranged in a tree-based structure state bits so that the make MRU operation only negates selection of the specific member in the D state, without affecting LRU victim selection of the other members.
摘要:
In an entry of a first cache memory within a first coherency domain of a data processing system including at least first and second coherency domains, a coherency state field is set to a first state that indicates that an associated address tag is valid, an associated storage location does not contain valid data, and a memory block identified by the address tag is likely cached outside the first coherency domain. In response to snooping a castout operation, the first cache memory determines if the castout operation hits in the entry and, if so, updates the coherency state field from the first state to a second state indicating that the associated address tag is invalid.
摘要:
In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.
摘要:
A cache memory which loads two memory values into two cache lines by receiving separate portions of a first requested memory value from a first data bus over a first time span of successive clock cycles and receiving separate portions of a second requested memory value from a second data bus over a second time span of successive clock cycles which overlaps with the first time span. In the illustrative embodiment a first input line is used for loading both a first byte array of the first cache line and a first byte array of the second cache line, a second input line is used for loading both a second byte array of the first cache line and a second byte array of the second cache line, and the transmission of the separate portions of the first and second memory values is interleaved between the first and second data busses. The first data bus can be one of a plurality of data busses in a first data bus set, and the second data bus can be one of a plurality of data busses in a second data bus set. Two address busses (one for each data bus set) are used to receive successive address tags that identify which portions of the requested memory values are being received from each data bus set. For example, the requested memory values may be 32 bytes each, and the separate portions of the requested memory values are received over four successive cycles with an 8-byte portion of each value received each cycle. The cache lines are spread across different cache sectors of the cache memory, wherein the cache sectors have different output latencies, and the separate portions of a given requested memory value are loaded sequentially into the corresponding cache sectors based on their respective output latencies. Merge flow circuits responsive to the cache controller are used to receive the portions of a requested memory value and input those bytes into the cache sector.
摘要:
A cache coherent data processing system includes a plurality of processing units each having at least an associated cache, a system memory, and a memory controller that is coupled to and controls access to the system memory. The system memory includes a plurality of storage locations for storing a memory block of data, where each of the plurality of storage locations is sized to store a sub-block of data. The system memory further includes metadata storage for storing metadata, such as a domain indicator, describing the memory block. In response to a failure of a storage location for a particular sub-block among the plurality of sub-blocks, the memory controller overwrites at least a portion of the metadata in the metadata storage with the particular sub-block of data.