摘要:
In response to executing a deallocate instruction, a deallocation request specifying a target address of a target cache line is sent from a processor core to a lower level cache. In response, a determination is made if the target address hits in the lower level cache. If so, the target cache line is retained in a data array of the lower level cache, and a replacement order field of the lower level cache is updated such that the target cache line is more likely to be evicted in response to a subsequent cache miss in a congruence class including the target cache line. In response to the subsequent cache miss, the target cache line is cast out to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core.
摘要:
A data processing system includes a processor core supported by upper and lower level caches. In response to executing a deallocate instruction in the processor core, a deallocation request is sent from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line. In response to receipt of the deallocation request at the lower level cache, a determination is made if the target address hits in the lower level cache. In response to determining that the target address hits in the lower level cache, the target cache line is retained in a data array of the lower level cache and a replacement order field in a directory of the lower level cache is updated such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss.
摘要:
A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.
摘要:
A victim cache memory includes a cache array, a cache directory of contents of the cache array, and a cache controller that controls operation of the victim cache memory. The cache controller, responsive to receiving a castout command identifying a victim cache line castout from another cache memory, causes the victim cache line to be held in the cache array. If the other cache memory is a higher level cache in the cache hierarchy of the processor core, the cache controller marks the victim cache line in the cache directory so that it is less likely to be evicted by a replacement policy of the victim cache, and otherwise, marks the victim cache line in the cache directory so that it is more likely to be evicted by the replacement policy of the victim cache.
摘要:
A processing unit includes a store-in lower level cache having reservation logic that determines presence or absence of a reservation and a processor core including a store-through upper level cache, an instruction execution unit, a load unit that, responsive to a hit in the upper level cache on a load-reserve operation generated through execution of a load-reserve instruction by the instruction execution unit, temporarily buffers a load target address of the load-reserve operation, and a flag indicating that the load-reserve operation bound to a value in the upper level cache. If a storage-modifying operation is received that conflicts with the load target address of the load-reserve operation, the processor core sets the flag to a particular state, and, responsive to execution of a store-conditional instruction, transmits an associated store-conditional operation to the lower level cache with a fail indication if the flag is set to the particular state.
摘要:
A processing unit includes a store-in lower level cache having reservation logic that determines presence or absence of a reservation and a processor core including a store-through upper level cache, an instruction execution unit, a load unit that, responsive to a hit in the upper level cache on a load-reserve operation generated through execution of a load-reserve instruction by the instruction execution unit, temporarily buffers a load target address of the load-reserve operation, and a flag indicating that the load-reserve operation bound to a value in the upper level cache. If a storage-modifying operation is received that conflicts with the load target address of the load-reserve operation, the processor core sets the flag to a particular state, and, responsive to execution of a store-conditional instruction, transmits an associated store-conditional operation to the lower level cache with a fail indication if the flag is set to the particular state.
摘要:
A method of processing store requests in a data processing system includes enqueuing a store request in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data.
摘要:
A data processing system includes an interconnect fabric, a system memory coupled to the interconnect fabric and including a virtual barrier synchronization region allocated to storage of virtual barrier synchronization registers (VBSRs), and a plurality of processing units coupled to the interconnect fabric and operable to access the virtual barrier synchronization region of the system memory. Each of the plurality of processing units includes a processor core and a cache memory including a cache array that caches VBSR lines from the virtual barrier synchronization region of the system memory and a cache controller. The cache controller, responsive to a store request from the processor core to update a particular VBSR line, performs a non-blocking update of the cache array in each other of the plurality of processing units contemporaneously holding a copy of the particular VBSR line by transmitting a VBSR update command on the interconnect fabric.
摘要:
In an entry of a first cache memory within a first coherency domain of a data processing system including at least first and second coherency domains, a coherency state field is set to a first state that indicates that an associated address tag is valid, an associated storage location does not contain valid data, and a memory block identified by the address tag is likely cached outside the first coherency domain. In response to snooping a castout operation, the first cache memory determines if the castout operation hits in the entry and, if so, updates the coherency state field from the first state to a second state indicating that the associated address tag is invalid.
摘要:
A cache coherent data processing system includes at least first and second coherency domains. The first coherency domain includes a system memory controller for a system memory and a first processing unit having a first cache memory. The second coherency domain includes a second processing unit having a second cache memory. In the first cache memory, a coherency state field associated with a storage location and an address tag is set to a first coherency state. In response to snooping an exclusive access request specifying a target address matching the address tag, the first cache memory provides a first partial response to the exclusive access request based at least in part upon the first coherency state. In response to snooping the exclusive access request, the memory controller determines whether it is responsible for the target address and provides a second partial response to the exclusive access request based at least in part upon an outcome of the determination. At least the first and second partial responses are accumulated to obtain a combined response for the exclusive access request. The combined response includes an indication of whether or not a highest point of coherency and a memory controller of a home system memory for the target address reside within a same coherency domain. The first cache memory updates the coherency state field from the first coherency state to a second coherency state in response to the indication in the combined response.