摘要:
An adaptive cache coherent purging protocol includes recognizing system performance, especially latency, is affected by when cache is purged. The occurrence of performance enhancing and degrading events regarding a cache are counted and compared to a threshold. When the threshold is triggered the cache becomes a candidate for purging. In an embodiment, a time out delay is implemented before actual purging occurs. When the threshold is not triggered but a cache event occurs, a fake time out delay is triggered and the count is adaptively either raised, lowered or set to zero in response to performance enhancing and/or degrading events. The effect is to make the actual purging more likely if the history of cache events indicates that the performance would be enhanced thereby or less likely if the history indicates that the performance would be degraded thereby.
摘要:
A multi-node computer network includes a plurality of nodes coupled together via a data link. Each of the nodes includes a local memory, which further comprises a shared memory. Certain items of data that are to be shared by the nodes are stored in the shared portion of memory. Associated with each of the shared data items is a data structure. When a node sharing data with other nodes in the system seeks to modify the data, it transmits the modifications over the data link to the other nodes in the network. Each update is received in order by each node in the cluster. As part of the last transmission by the modifying node, an acknowledgement request is sent to the receiving nodes in the cluster. Each node that receives the acknowledgment request returns an acknowledgement to the sending node. The returned acknowledgement is written to the data structure associated with the shared data item. If there is an error during the transmission of the message, the receiving node does not transmit an acknowledgement, and the sending node is thereby notified that an error has occurred.
摘要:
An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory. Memory performance is additionally improved by including, at each memory, a delayed write buffer which is used in conjunction with the directory to identify victims that are to be written to memory. An arb bus coupled to the output of the directory of each node provides a central ordering point for all messages that are transferred through the SMP. The messages comprise a number of transactions, and each transaction is assigned to a number of different virtual channels, depending upon the processing stage of the message. The use of virtual channels thus helps to maintain data coherency by providing a straightforward method for maintaining system order. Using the virtual channels and the directory structure, cache coherency problems that would previously result in deadlock may be avoided.
摘要:
A method for preventing inadvertent invalidation of data elements in a system having a separate probe queue and fill queue for each central processing unit, is provided wherein a central processing unit stores a clean data element, that would otherwise have been discarded, in a victim data buffer when it is evicted from cache. The central processing unit subsequently issues a clean-victim command to the system control logic when the readmiss or read-miss-modify command, targeting the data element that maps to the same location in cache as the clean data element, is issued. The clean-victim command causes the duplicate tag store to indicate that the clean data element is no longer stored in that central processing unit's cache. While the data is stored therein, the central processing unit cannot issue a probe message that targets that data until the victim data buffer has been deallocated. The central processing unit cannot modify the data element and therefore, if a probe invalidate has previously been issued for the clean version of the data element, it will not be able to inadvertently invalidate a modified version of the data element.
摘要:
A mechanism reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory. The mechanism comprises a commit-signal that is generated by control logic of the multiprocessor system in response to an issued memory reference operation. The commit-signal facilitates inter-reference ordering; moreover, the commit signal indicates the apparent completion of the memory reference operation, rather than actual completion of the operation. The apparent completion of an operation occurs substantially sooner than the actual completion of an operation, thereby improving performance of the multiprocessor system.
摘要:
In accordance with the present invention, a method and apparatus is provided for storing victim data evicted from a cache and for satisfying pending requests or probe messages that target victim data, using a set of victim data buffers coupled to a central processing unit of a computer system. Storage locations referred to as a "victim valid bit" and a "probe valid bit" are associated with each victim data buffer in the computer system to indicate a release condition for the coupled victim data buffer. With such an arrangement, the victim data buffer can be deallocated when the victim valid bit and the probe valid bit have both been cleared.
摘要:
A predictor which chooses between two or more predictors is described. The predictor includes a first component predictor which operates according to a first algorithm to produce a prediction of an action and a second component predictor which operates according to a second algorithm to produce a prediction of said action. The predictor also includes means, coupled to each of said first and second predictors, for choosing between predictions provided from said predictors to provide a prediction of the action from the predictor. The predictor can be used to predict outcomes of branches, cache hits, prefetched instruction sequences, and so forth.