摘要:
Disclosed is a processor, which reduces issuing of unnecessary barrier operations during instruction processing. The processor comprises an instruction sequencing unit and a load store unit (LSU) that issues a group of memory access requests that precede a barrier instruction in an instruction sequence. The processor also includes a controller, which in response to a determination that all of the memory access requests hit in a cache affiliated with the processor, withholds issuing on an interconnect a barrier operation associated with the barrier instruction. The controller further directs the load store unit to ignore the barrier instruction and complete processing of a next group of memory access requests following the barrier instruction in the instruction sequence without receiving an acknowledgment.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. In conventional systems using a MESI approach, two or more processors will often compete for ownership of a common cache line. As a result, ownership of the cache line will frequently “bounce” between multiple processors, which causes a significant reduction in cache efficiency. The preferred embodiment provides a modified MESI state which holds the status of the cache line static for a fixed period of time, which eliminates the bounce effect from contention between multiple processors.
摘要:
Disclosed is a method of operating a processor, by which a speculatively issued load request, which fetches incorrect data, is recycled. An instruction sequence, which includes a barrier instruction and a load instruction that follows the barrier instruction in program order, is received for execution. In response to the barrier instruction, a barrier operation is issued on an interconnect. Following, in response to the load instruction and while the barrier operation is pending, a load request is issued to memory. When a pre-determined type of invalidate, which is affiliated with the load request, is received before the receipt of an acknowledgment for the barrier operation, data that is returned by memory in response to the load request is discarded and the load request is re-issued. The pre-determined type of invalidate includes, for example, a snoop invalidate.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. In particular, as multiple processors compete for the same cache line, a significant amount of processor time is lost determining if another processor's cache line lock has been released and attempting to reserve that cache line while it is still owned by the other processor. The preferred embodiment provides an additional cache state which specifically indicates that a processor has released its lock on a cache line after it has performed any necessary modifications.
摘要:
A novel cache coherency protocol provides a modified-unsolicited (MU) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the MU state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The MU state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the MU state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.
摘要:
An effectively “conditional”, cast out operation or cast out portion of a combined operation including a related data access may be cancelled by the combined response to the operation. The combined response logic receives coherency state and/or LRU position information for cache lines corresponding to the cast out victim within snoopers and vertically in-line storage. The combined response logic may also receive information regarding the presence of shared or invalid cache lines in snoopers or lower level storage within the congruence class for the victim, or information regarding the read-once nature of the data access target. Based on these responses, the combined response logic determines whether the cast out should be cancelled and, if so, selects and drives the appropriate combined response code.
摘要:
A cache coherency protocol uses a “Exclusive-Deallocate” (ED) coherency state to indicate that a particular value is currently held in an upper level cache in an exclusive, unmodified form (not shared with any other caches of the computer system, including caches associated with the same processing unit), so that the value can conveniently be modified without any lower level bus transactions since no lower level caches have allocated a line for the value. If the value is subsequently modified in the upper level cache, its coherency state is simply switched to “modified” without the need for any bus transactions. Conversely, if the value is evicted from the upper level cache without ever having been modified, it can be loaded into the lower level cache with a coherency state indicating that the lower level cache contains the unmodified value exclusive of all other caches in other processing units of the computer system. If the value is initially loaded into the upper level cache from a cache of another processing unit, or from a lower level cache of the same processing unit, then the upper level cache may be selectively programmed to mark the cache line with the ED state.
摘要:
Upon snooping a combined data access and cast out/deallocate operation initiating by a horizontal storage device, snoop logic determines, from LRU position information appended to the combined response to the combined operation, whether the coherency state and/or LRU position of the victim may be upgraded within the subject storage device. If so, the coherency state or LRU position is upgraded to improve global data storage management. For instance, a cache line within a snooping storage device may be altered to assume the coherency state of the victim within the storage device initiating the combined operation to improve data storage management under a given replacement policy.
摘要:
Combined response logic for a bus receives a combined data access and cast out/deallocate operation initiating by a storage device within a specific level of a storage hierarchy with a coherency state of the cast out/deallocate victim appended. Snoopers on the bus drive snoop responses to the combined operation with the coherency state and/or LRU position of locally-stored cache lines corresponding to the victim appended. The combined response logic determines, from the coherency state information appended to the combined operation and the snoop responses, whether a coherency upgrade is possible. If so, the combined response logic selects a snooper storage device to upgrade the coherency state of a respective cache line corresponding to the victim, and appends an upgrade directive to the combined response. The snooper selected to upgrade the coherency state of a cache line corresponding the victim may be randomly chosen or, as an optimization, be chosen for having the highest LRU position for the respective cache line.
摘要:
An optimizing compiler for generating STORE instructions having memory hierarchy control bits is disclosed. The compiler first converts a first STORE instruction to a second STORE instruction. The compiler then provides an operation code field within the second instruction for indicating an updating operation. The compiler further provides a vertical write-through level field within the second instruction for indicating a vertical memory level and a horizontal memory level within a multi-level memory hierarchy to which the updating operation should be applied.