摘要:
At a first cache memory affiliated with a first processor core, an exclusive memory access operation is received via an interconnect fabric coupling the first cache memory to second and third cache memories respectively affiliated with second and third processor cores. The exclusive memory access operation specifies a target address. In response to receipt of the exclusive memory access operation, the first cache memory detects presence or absence of a source indication indicating that the exclusive memory access operation originated from the second cache memory to which the first cache memory is coupled by a private communication network to which the third cache memory is not coupled. In response to detecting presence of the source indication, a coherency state field of the first cache memory that is associated with the target address is updated to a first data-invalid state. In response to detecting absence of the source indication, the coherency state field of the first cache memory is updated to a different second data-invalid state.
摘要:
A cache, system and method for improving the snoop bandwidth of a cache directory. A cache directory may be sliced into two smaller cache directories each with its own snooping logic. By having two cache directories that can be accessed simultaneously, the bandwidth can be essentially doubled. Furthermore, a “frequency matcher” may shift the cycle speed to a lower speed upon receiving snoop addresses from the interconnect thereby slowing down the rate at which requests are transmitted to the dispatch pipelines. Each dispatch pipeline is coupled to a sliced cache directory and is configured to search the cache directory to determine if data at the received addresses is stored in the cache memory. As a result of slowing down the rate at which requests are transmitted to the dispatch pipelines and accessing the two sliced cache directories simultaneously, the bandwidth or throughput of the cache directory may be improved.
摘要:
A data processing system includes a memory system, a plurality of masters that issue requests for access to memory blocks within the memory system, a plurality of snoopers that provide partial responses to requests by the masters, and response logic that generates combined responses for the requests in response to the partial responses provided by the plurality of snoopers. The plurality masters includes a winning master that issues a request for a particular memory block, and the plurality of snoopers includes a protecting snooper that, in response to receipt of the request, provides a partial response and protects a transfer of coherency ownership of the particular memory block to the winning master until expiration of a protection window extension following receipt from the response logic of a combined response for the request.
摘要:
A system and method for cache management in a data processing system having a memory hierarchy of upper memory and lower memory cache. A lower memory cache controller accesses a coherency state table to determine replacement policies of coherency states for cache lines present in the lower memory cache when receiving a cast-in request from one of the upper memory caches. The coherency state table implements a replacement policy that retains the more valuable cache coherency state information between the upper and lower memory caches for a particular cache line contained in both levels of memory at the time of cast-out from the upper memory cache.
摘要:
A method, system, and processor chip design for reducing the latency between completing a LARX operation and receiving the associated STCX operation to complete the update to the cache line. Each entry of the store queue of the issuing processor is provided an additional tracking bit (priority bit). The priority bit is set whenever a STCX operation is placed within the entry. During selection of an entry for dispatch by the arbitration logic, the arbitration logic scans the value of the priority bits of each eligible entry. An entry with the priority bit set is given priority in the selection process within architectural rules. That entry is then selected for dispatch as early as is possible within the established rules.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
In response to execution of program code, a control register within scrubbing logic in a local coherency domain is initialized with at least a target address of a target memory block. In response to the initialization, the scrubbing logic issues to at least one cache hierarchy in a remote coherency domain a domain indication scrubbing request targeting a target memory block that may be cached by the at least one cache hierarchy. In response to receipt of a coherency response indicating that the target memory block is not cached in the remote coherency domain, a domain indication in the local coherency domain is updated to indicate that the target memory block is cached, if at all, only within the local coherency domain.
摘要:
A processing unit for a multiprocessor data processing system includes a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, at least one instruction execution unit that executes a store-conditional instruction to determine a store target address, a store queue that, following execution of the store-conditional instruction, buffers a corresponding store operation, sequencer logic associated with the store queue. The sequencer logic, responsive to receipt of a latency indication indicating that resolution of the store-conditional operation as passing or failing is subject to significant latency, invalidates, prior to resolution of the store-conditional operation, a cache line in the store-through upper level cache to which a load-reserve operation previously bound.
摘要:
A system and method for cache management in a data processing system having a memory hierarchy of upper memory and lower memory cache. A lower memory cache controller accesses a coherency state table to determine replacement policies of coherency states for cache lines present in the lower memory cache when receiving a cast-in request from one of the upper memory caches. The coherency state table implements a replacement policy that retains the more valuable cache coherency state information between the upper and lower memory caches for a particular cache line contained in both levels of memory at the time of cast-out from the upper memory cache.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. Information, such as the address, of the snoop request is stored in a queue of the stall/reorder unit. The stall/reorder unit forwards the snoop request to a selector which also receives a request from a processor. An arbitration mechanism selects either the snoop request or the request from the processor. If the snoop request is denied by the arbitration mechanism, information, e.g., address, about the snoop request may be maintained in the stall/reorder unit. The request may be later resent to the selector. This process may be repeated up to “n” clock cycles. By providing the snoop request additional opportunities (n clock cycles) to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied.