摘要:
Disclosed is a processor, which reduces issuing of unnecessary barrier operations during instruction processing. The processor comprises an instruction sequencing unit and a load store unit (LSU) that issues a group of memory access requests that precede a barrier instruction in an instruction sequence. The processor also includes a controller, which in response to a determination that all of the memory access requests hit in a cache affiliated with the processor, withholds issuing on an interconnect a barrier operation associated with the barrier instruction. The controller further directs the load store unit to ignore the barrier instruction and complete processing of a next group of memory access requests following the barrier instruction in the instruction sequence without receiving an acknowledgment.
摘要:
A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. In conventional systems using a MESI approach, two or more processors will often compete for ownership of a common cache line. As a result, ownership of the cache line will frequently “bounce” between multiple processors, which causes a significant reduction in cache efficiency. The preferred embodiment provides a modified MESI state which holds the status of the cache line static for a fixed period of time, which eliminates the bounce effect from contention between multiple processors.
摘要:
Disclosed is a method of operating a processor, by which a speculatively issued load request, which fetches incorrect data, is recycled. An instruction sequence, which includes a barrier instruction and a load instruction that follows the barrier instruction in program order, is received for execution. In response to the barrier instruction, a barrier operation is issued on an interconnect. Following, in response to the load instruction and while the barrier operation is pending, a load request is issued to memory. When a pre-determined type of invalidate, which is affiliated with the load request, is received before the receipt of an acknowledgment for the barrier operation, data that is returned by memory in response to the load request is discarded and the load request is re-issued. The pre-determined type of invalidate includes, for example, a snoop invalidate.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. In particular, as multiple processors compete for the same cache line, a significant amount of processor time is lost determining if another processor's cache line lock has been released and attempting to reserve that cache line while it is still owned by the other processor. The preferred embodiment provides an additional cache state which specifically indicates that a processor has released its lock on a cache line after it has performed any necessary modifications.
摘要:
A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.
摘要:
A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions and treats instructions in a different manner when they are loaded speculatively. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. The load requests are sent to the lower level cache when the upper level cache does not contain the value required by the load. If a speculative request is for an instruction which is likewise not present in the lower level cache, that request is ignored, keeping both the lower level and upper level caches free of speculative values that are infrequently used. If the value is present in the lower level cache, it is loaded into the upper level cache. If a speculative request is for operand data, the value is loaded only into the lower level cache if it is not already present, keeping the upper level cache free of speculative operand data.
摘要:
A novel cache coherency protocol provides a modified-unsolicited (MU) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the MU state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The MU state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the MU state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.
摘要:
A cache coherency protocol uses a “Exclusive-Deallocate” (ED) coherency state to indicate that a particular value is currently held in an upper level cache in an exclusive, unmodified form (not shared with any other caches of the computer system, including caches associated with the same processing unit), so that the value can conveniently be modified without any lower level bus transactions since no lower level caches have allocated a line for the value. If the value is subsequently modified in the upper level cache, its coherency state is simply switched to “modified” without the need for any bus transactions. Conversely, if the value is evicted from the upper level cache without ever having been modified, it can be loaded into the lower level cache with a coherency state indicating that the lower level cache contains the unmodified value exclusive of all other caches in other processing units of the computer system. If the value is initially loaded into the upper level cache from a cache of another processing unit, or from a lower level cache of the same processing unit, then the upper level cache may be selectively programmed to mark the cache line with the ED state.