摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. Information, such as the address, of the snoop request is stored in a queue of the stall/reorder unit. The stall/reorder unit forwards the snoop request to a selector which also receives a request from a processor. An arbitration mechanism selects either the snoop request or the request from the processor. If the snoop request is denied by the arbitration mechanism, information, e.g., address, about the snoop request may be maintained in the stall/reorder unit. The request may be later resent to the selector. This process may be repeated up to “n” clock cycles. By providing the snoop request additional opportunities (n clock cycles) to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied.
摘要:
Fine-grained detection of data modification of original data is provided by associating separate guard bits with granules of memory storing original data from which translated data has been obtained. The guard bits indicating whether the original data stored in the associated granule is protected for data coherency. The guard bits are set and cleared by special-purpose instructions. Responsive to attempting access to translated data obtained from the original data, the guard bit(s) associated with the original data is checked to determine whether the guard bit(s) fail to indicate coherency of the original data, and if so, discarding of the translated data is initiated to facilitate maintaining data coherency between the original data and the translated data.
摘要:
A method is provided for fine-grained detection of data modification of original data by associating separate guard bits with granules of memory storing original data from which translated data has been obtained. The guard bits indicating whether the original data stored in the associated granule is protected for data coherency. The guard bits are set and cleared by special-purpose instructions. Responsive to attempting access to translated data obtained from the original data, the guard bit(s) associated with the original data is checked to determine whether the guard bit(s) fail to indicate coherency of the original data, and if so, discarding of the translated data is initiated to facilitate maintaining data coherency between the original data and the translated data.
摘要:
A data processing system includes a plurality of processing units coupled by an interconnect fabric. In response to a data request, a victim cache line is selected for castout from a first lower level cache of a first processing unit, and a target lower level cache of one of the plurality of processing units is selected based upon architectural proximity of the target lower level cache to a home system memory to which the address of the victim cache line is assigned. The first processing unit issues on the interconnect fabric a lateral castout (LCO) command that identifies the victim cache line to be castout from the first lower level cache and indicates that the target lower level cache is an intended destination. In response to a coherence response indicating success of the LCO command, the victim cache line is removed from the first lower level cache and held in the second lower level cache.
摘要:
A store request in enqueued in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data.
摘要:
A first SMP computer has first and second processing units and a first system memory pool, a second SMP computer has third and fourth processing units and a second system memory pool, and a third SMP computer has at least fifth and sixth processing units and third, fourth and fifth system memory pools. The fourth system memory pool is inaccessible to the third, fourth and sixth processing units and accessible to at least the second and fifth processing units, and the fifth system memory pool is inaccessible to the first, second and sixth processing units and accessible to at least the fourth and fifth processing units. A first interconnect couples the second processing unit for load-store coherent, ordered access to the fourth system memory pool, and a second interconnect couples the fourth processing unit for load-store coherent, ordered access to the fifth system memory pool.
摘要:
A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.
摘要:
A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.
摘要:
In at least one embodiment, a method of data processing in a data processing system having a memory hierarchy includes a processor core executing a storage-modifying memory access instruction to determine a memory address. The processor core transmits to a cache memory within the memory hierarchy a storage-modifying memory access request including the memory address, an indication of a memory access type, and, if present, a partial cache line hint signaling access to less than all granules of a target cache line of data associated with the memory address. In response to the storage-modifying memory access request, the cache memory performs a storage-modifying access to all granules of the target cache line of data if the partial cache line hint is not present and performs a storage-modifying access to less than all granules of the target cache line of data if the partial cache line hint is present.
摘要:
A cache, system and method for improving the snoop bandwidth of a cache directory. A cache directory may be sliced into two smaller cache directories each with its own snooping logic. By having two cache directories that can be accessed simultaneously, the bandwidth can be essentially doubled. Furthermore, a “frequency matcher” may shift the cycle speed to a lower speed upon receiving snoop addresses from the interconnect thereby slowing down the rate at which requests are transmitted to the dispatch pipelines. Each dispatch pipeline is coupled to a sliced cache directory and is configured to search the cache directory to determine if data at the received addresses is stored in the cache memory. As a result of slowing down the rate at which requests are transmitted to the dispatch pipelines and accessing the two sliced cache directories simultaneously, the bandwidth or throughput of the cache directory may be improved.