摘要:
A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.
摘要:
A method for preventing inadvertent invalidation of data elements in a system having a separate probe queue and fill queue for each central processing unit, is provided wherein a central processing unit stores a clean data element, that would otherwise have been discarded, in a victim data buffer when it is evicted from cache. The central processing unit subsequently issues a clean-victim command to the system control logic when the readmiss or read-miss-modify command, targeting the data element that maps to the same location in cache as the clean data element, is issued. The clean-victim command causes the duplicate tag store to indicate that the clean data element is no longer stored in that central processing unit's cache. While the data is stored therein, the central processing unit cannot issue a probe message that targets that data until the victim data buffer has been deallocated. The central processing unit cannot modify the data element and therefore, if a probe invalidate has previously been issued for the clean version of the data element, it will not be able to inadvertently invalidate a modified version of the data element.
摘要:
In accordance with the present invention, a method and apparatus is provided for storing victim data evicted from a cache and for satisfying pending requests or probe messages that target victim data, using a set of victim data buffers coupled to a central processing unit of a computer system. Storage locations referred to as a "victim valid bit" and a "probe valid bit" are associated with each victim data buffer in the computer system to indicate a release condition for the coupled victim data buffer. With such an arrangement, the victim data buffer can be deallocated when the victim valid bit and the probe valid bit have both been cleared.
摘要:
A multiprocessor computer system releases a victim data buffer storing victim data, when system control logic determines that a count of the number of probe messages pending at a specified time equals the number of such probe messages that have had an address comparison performed after the specified time. The specified time occurs when a command to write the victim data element to main memory passes a serialization point of the computer system.The address comparison compares a target address of a probe message with addresses of data stored in the victim data buffer and the associated cache of a CPU of the computer system.
摘要:
In accordance with the present invention, a method and apparatus is provided for maintaining the coherency of victim data from a time when the data is stored in a victim data buffer until a time when the data is written into a main memory. Alternatively, the coherency of the victim data is preserved until a determination is made that pending probe messages do not target the victim data. At that time the victim data buffer can be deallocated.With both arrangements, a central processing unit can release a victim data buffer at a point in time other than when the data that is stored therein is read from the buffer. Thus, the central processor unit can perform the release or deallocation of the buffer when it is most efficient and when no further access to the data is required.
摘要:
A computer system supports a first set of processors configured to operate in a dirty-shared mode and a second set of processors configured to operate in a non dirty-shared mode. The computer system may include a portion of shared memory that stores data in terms of memory blocks. Upon receiving a snoop read requesting shared access to a memory block held in a dirty state, a dirty-shared processor sends a copy of the memory block to the originator of the snoop read and retains a valid a copy of the block in its cache. Non dirty-shared processors additionally write the block back to main memory in response to snoop reads and may also send a copy to the originator. Until the write back is completed at main memory or another processor is granted write access to the block, the dirty-shared and non dirty-shared processors preferably continue to satisfy subsequent snoop reads targeting the memory block.
摘要:
A system may comprise a first node that includes an ordering point for data, the first node being operative to employ a write-back transaction associated with writing the data back to memory. The first node broadcasts a write-back message to at least one other node in the system in response to an acknowledgement provided by the memory indicating that the ordering point for the data has migrated from the first node to the memory.
摘要:
A technique to share cache lines among a plurality of bus agents. Embodiments of the invention comprise at least one technique to allow a number of agents, such as a processor or software program being executed by a processor, within a computer system or computer network to access a locked (“owned”) cache line, under certain circumstances, without incurring as much of the operational overhead and resulting performance degradation of many prior art techniques.
摘要:
Systems and methods are disclosed for interaction between different cache coherency protocols. One system may comprise a home node that receives a request for data from a first node in a first cache coherency protocol. A second node provides a conflict response to a request for the data from the home node. The conflict response indicates that an ordering point for the data is migrating according to a second cache coherency protocol, which is different from the first cache coherency protocol.
摘要:
Data coherency in a multiprocessor system is improved and data latency minimized through the use of data mapping “fill” requests from any one of the multiprocessor CPUs such that the information requested is acquired through the crossbar switch from the same memory module to which the “victim” data in that CPU's cache must be rewritten. With such an arrangement rewrite latency periods for victim data within the crossbar switch is minimized and the 'ships crossing in the night' problem is avoided.