Abstract:
A set associative cache includes a cache controller, a directory, and an array including at least one congruence class containing a plurality of sets. The plurality of sets are partitioned into multiple groups according to which of a plurality of information types each set can store. The sets are partitioned so that at least two of the groups include the same set and at least one of the sets can store fewer than all of the information types. To optimize cache operation, the cache controller dynamically modifies a cache policy of a first group while retaining a cache policy of a second group, thus permitting the operation of the cache to be individually optimized for different information types. The dynamic modification of cache policy can be performed in response to either a hardware-generated or software-generated input.
Abstract:
A system for time-ordered execution of load instructions. More specifically, the system enables just-in-time delivery of data requested by a load instruction. The system consists of a processor, an L1 data cache with corresponding L1 cache controller, and an instruction processor. The instruction processor manipulates a plurality of architected time dependency fields of a load instruction to create a plurality of dependency fields. The dependency fields holds a relative dependency value which is utilized to order the load instruction in a Relative Time-Ordered Queue (RTOQ) of the L1 cache controller. The load instruction is sent from RTOQ to the L1 data cache at a particular time so that the data requested is loaded from the L1 data cache at the time specified by one of the dependency fields. The dependency fields are prioritized so that the cycle corresponding to the highest priority field which is available is utilized.
Abstract:
A method and system for allocating lower level cache entries for data castout from an upper level cache provides improved computer system performance by adjusting the ordering of least-recently-used (LRU) information within a cache. Data that is castout from a higher level cache can be written after a read is satisfied and the castout entry will not be labeled as most-recently-used. This improves performance under certain operating conditions of a computing system, as castout data is often less important to keep in lower level cache than data that is also present in the higher level cache.
Abstract:
A method of operating a computer system is disclosed in which an instruction having an explicit prefetch request is issued directly from an instruction sequence unit to a prefetch unit of a processing unit. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value.
Abstract:
Logically in line caches within a multilevel cache hierarchy are jointly controlled by single cache controller. By combining the cache controller and snoop logic for different levels within the cache hierarchy, separate queues are not required for each level. During a cache access, cache directories are looked up in parallel. Data is retrieved from an upper cache if hit, or from the lower cache if the upper cache misses and the lower cache hits. LRU units may be updated in parallel based on cache directory hits. Alternatively, the lower cache LRU unit may be updated based on cache memory accesses rather than cache directory hits, or the cache hierarchy may be provided with user selectable modes of operation for both LRU unit update schemes. The merged vertical cache controller mechanism does not require the lower cache memory to be inclusive of the upper cache memory. A novel deallocation scheme and update protocol may be implemented in conjunction with the merged vertical cache controller mechanism.
Abstract:
A novel cache coherency protocol provides a modified-unsolicited (Mu) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the Mu state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The Mu state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the Mu state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.
Abstract:
Combined response logic for a bus receives a combined data access and cast out/deallocate operation initiating by a storage device within a specific level of a storage hierarchy, with a coherency state and LRU position of the cast out/deallocate victim appended. Snoopers on the bus drive snoop responses to the combined operation with the coherency state and/or LRU position of locally-stored cache lines corresponding to the victim appended. The combined response logic determines, from the coherency state and LRU position information appended to the combined operation and the snoop responses, whether an update of the LRU position and/or coherency state of a cache line corresponding to the victim within one of the snoopers is required. If so, the combined response logic selects a snooper storage device to have at least the LRU position of a respective cache line corresponding to the victim updated, and appends an update command identifying the selected snooper to the combined response. The snooper selected to be updated may be randomly chosen, selected based on LRU position of the cache line corresponding to the victim within respective storage, or selected based on other criteria.
Abstract:
A novel cache coherency protocol provides a modified-unsolicited (MU) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the MU state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The MU state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the MU state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.
Abstract:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs the storage device initiating the combined operation not to allocate storage for the target of the data access. Instead, the target of the data access may be passed directly to an in-line processor core without storage, may be stored in a horizontal storage device, or may be stored in an in-line, noninclusive, lower level storage device. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
Abstract:
An apparatus and method for monitoring an internal communication path, i.e. an internal bus, of an integrated circuit is described. The internal bus operates at a particular frequency, fb. An image of the internal bus is produced, operating at a lower frequency of operations, fo, which is more amenable to monitoring by test equipment. Signals are received from and driven to the bus using driver/receiver circuitry. The signals may be input-only, output-only, or bi-directional signals. The signals to be monitored are tapped in the driver/receiver circuitry. Depending on the placement of the signal taps in the driver/receiver logic, the signals may be “out-of-phase” with respect to one another. A buffer/align unit processes the signals in order to produce a time delayed version of the signals. The buffer/aliqn unit is used to bring each of the monitored signals back in phase relative to one another. Encoding circuitry encodes the time delayed version of the bus in a manner that produces an image of the bus at the lower frequency of operations, fo. The encoding circuitry considers the values of the monitored signals over an encoding window, and produces an encoded value for each signal at the lower frequency of operations, fo.