摘要:
A data processing system includes a processor core having an associated upper level cache and a lower level victim cache. In response to a memory access request of the processor core that specifies a non-modifying access to a target coherency granule, a determination is made whether the memory access request hits or misses in a directory of the lower level victim cache. In response to determining that the memory access request hits in the lower level victim cache in a data-valid coherence state, the lower level victim cache provides the target coherency granule of the memory access request to the upper level cache. The lower level victim cache preserves the target coherency granule in the lower level victim cache in a shared coherence state if the memory access request is of a first type and invalidates the target coherency granule if the memory access request is of a second type.
摘要:
In response to a data request, a victim cache line is selected for castout from a lower level cache, and a target lower level cache of one of the plurality of processing units is selected. A determination is made whether the selected target lower level cache has provided more than a threshold number of retry responses to lateral castout (LCO) commands of the first lower level cache, and if so, a different target lower level cache is selected. The first processing unit thereafter issues a LCO command on the interconnect fabric. The LCO command identifies the victim cache line to be castout and indicates that the target lower level cache is an intended destination of the victim cache line. In response to a successful coherence response to the LCO command, the victim cache line is removed from the first lower level cache and held in the second lower level cache.
摘要:
A second lower level cache receives an LCO command issued by a first lower level cache on an interconnect fabric. The LCO command indicates an address of a victim cache line to be castout from the first lower level cache and indicates that the second lower level cache is an intended destination of the victim cache line. The second lower level cache determines whether to accept the victim cache line from the first lower level cache based at least in part on the address of the victim cache line indicated by the LCO command. In response to determining not to accept the victim cache line, the second lower level cache provides a coherence response to the LCO command refusing the identified victim cache line. In response to determining to accept the victim cache line, the second lower level cache updates an entry corresponding to the identified victim cache line.
摘要:
A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.
摘要:
A method and apparatus for preventing selection of Deleted (D) members as an LRU victim during LRU victim selection. During each cache access targeting the particular congruence class, the deleted cache line is identified from information in the cache directory. A location of a deleted cache line is pipelined through the cache architecture during LRU victim selection. The information is latched and then passed to MRU vector generation logic. An MRU vector is generated and passed to the MRU update logic, which is selects/tags the deleted member as a MRU member. The make MRU operation affects only the lower level LRU state bits arranged in a tree-based structure state bits so that the make MRU operation only negates selection of the specific member in the D state, without affecting LRU victim selection of the other members.
摘要:
A cache memory includes a cache array including a plurality of congruence classes each containing a plurality of cache lines, where each cache line belongs to one of multiple classes which include at least a first class and a second class. The cache memory also includes a cache directory of the cache array that indicates class membership. The cache memory further includes a cache controller that selects a victim cache line for eviction from a congruence class. If the congruence class contains a cache line belonging to the second class, the cache controller preferentially selects as the victim cache line a cache line of the congruence class belonging to the second class based upon access order. If the congruence class contains no cache line belonging to the second class, the cache controller selects as the victim cache line a cache line belonging to the first class based upon access order.
摘要:
A data processing system includes at least a first processing node having an input/output (I/O) controller and a second processing including a memory controller for a memory. The memory controller receives, in order, pipelined first and second DMA write operations from the I/O controller, where the first and second DMA write operations target first and second addresses, respectively. In response to the second DMA write operation, the memory controller establishes a state of a domain indicator associated with the second address to indicate an operation scope including the first processing node. In response to the memory controller receiving a data access request specifying the second address and having a scope excluding the first processing node, the memory controller forces the data access request to be reissued with a scope including the first processing node based upon the state of the domain indicator associated with the second address.
摘要:
A data processing system includes a plurality of processing units each having a respective point-to-point communication link with each of multiple others of the plurality of processing units but fewer than all of the plurality of processing units. Each of the plurality of processing units includes interconnect logic, coupled to each point-to-point communication link of that processing unit, that broadcasts requests received from one of the multiple others of the plurality of processing units to one or more of the plurality of processing units. The interconnect logic includes a partial response data structure including a plurality of entries each associating a partial response field with a plurality of flags respectively associated with each processing unit containing a snooper from which that processing unit will receive a partial response. The interconnect logic accumulates partial responses of processing units by reference to the partial response field to obtain an accumulated partial response, and when the plurality of flags indicate that all processing units from which partial responses are expected have returned a partial response, outputs the accumulated partial response.
摘要:
A memory lock mechanism within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is denied. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.
摘要:
A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.