摘要:
A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit and a cache memory. The cache memory includes a cache controller, a data array including a data storage location for caching a memory block, and a cache directory. The cache directory includes a tag field for storing an address tag in association with the memory block and a coherency state field associated with the tag field and the data storage location. The coherency state field has a plurality of possible states including a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is possibly cached outside of the first coherency domain.
摘要:
A method, apparatus, and computer for identifying selection of a bad victim during victim selection at a cache and recovering from such bad victim selection without causing the system to crash or suspend forward progress of the victim selection process. Among the bad victim selection addressed are recovery from selection of a deleted member and recovery from use of LRU state bits that do not map to a member within the congruence class. When LRU victim selection logic generates an output vector identifying a victim, the output vector is checked to ensure that it is a valid vector (non-null) and that it is not pointing to a deleted member. When the output vector is not valid or points to a deleted member, the LRU victim selection logic is triggered to re-start the victim selection process.
摘要:
A cache memory logically partitions a cache array having a single access/command port into at least two slices, and uses a first cache directory to access the first cache array slice while using a second cache directory to access the second cache array slice, but accesses from the cache directories are managed using a single cache arbiter which controls the single access/command port. In the illustrative embodiment, each cache directory has its own directory arbiter to handle conflicting internal requests, and the directory arbiters communicate with the cache arbiter. An address tag associated with a load request is transmitted from the processor core with a designated bit that associates the address tag with only one of the cache array slices whose corresponding directory determines whether the address tag matches a currently valid cache entry. The cache array may be arranged with rows and columns of cache sectors wherein a given cache line is spread across sectors in different rows and columns, with at least one portion of the given cache line being located in a first column having a first latency and another portion of the given cache line being located in a second column having a second latency greater than the first latency. The cache array outputs different sectors of the given cache line in successive clock cycles based on the latency of a given sector.
摘要:
A method and apparatus for enabling protection of a particular member of a cache during LRU victim selection. LRU state array includes additional “protection” bits in addition to the state bits. The protection bits serve as a pointer to identify the location of the member of the congruence class that is to be protected. A protected member is not removed from the cache during standard LRU victim selection, unless that member is invalid. The protection bits are pipelined to MRU update logic, where they are used to generate an MRU vector. The particular member identified by the MRU vector (and pointer) is protected from selection as the next LRU victim, unless the member is Invalid. The make MRU operation affects only the lower level LRU state bits arranged a tree-based structure and thus only negates the selection of the protected member, without affecting LRU victim selection of the other members.
摘要:
A method and apparatus for preventing selection of Deleted (D) members as an LRU victim during LRU victim selection. During each cache access targeting the particular congruence class, the deleted cache line is identified from information in the cache directory. A location of a deleted cache line is pipelined through the cache architecture during LRU victim selection. The information is latched and then passed to MRU vector generation logic. An MRU vector is generated and passed to the MRU update logic, which is selects/tags the deleted member as a MRU member. The make MRU operation affects only the lower level LRU state bits arranged in a tree-based structure state bits so that the make MRU operation only negates selection of the specific member in the D state, without affecting LRU victim selection of the other members.
摘要:
A cache memory logically partitions a cache array into at least two slices each having a plurality of cache lines, with a given cache line spread across two or more cache ways of contiguous bytes and a given cache way shared between the two cache slices, and if one a cache way is defective that is part of a first cache line in the first cache slice and part of a second cache line in the second cache slice, it is disabled while continuing to use at least one other cache way which is also part of the first cache line and part of the second cache line. In the illustrative embodiment the cache array is set associative and at least two different cache ways for a given cache line contain different congruence classes for that cache line. The defective cache way can be disabled by preventing an eviction mechanism from allocating any congruence class in the defective way. For example, half of the cache line can be disabled (i.e., half of the congruence classes). The cache array may be arranged with rows and columns of cache sectors (rows corresponding to the cache ways) wherein a given cache line is further spread across sectors in different rows and columns, with at least one portion of the given cache line being located in a first column having a first latency and another portion of the given cache line being located in a second column having a second latency greater than the first latency. The cache array can also output different sectors of the given cache line in successive clock cycles based on the latency of a given sector.
摘要:
A method and system are disclosed for managing saved process states in a memory of a data processing system that has multiple partitions executing independent operating systems. A hypervisor manager affords access to any processor in the data processing system for the purpose of storing process states for that processor to the memory, independent of the operating system running on the processor.
摘要:
A method and system are disclosed for pre-loading a hard architected state of a next process from a pool of idle processes awaiting execution. When an executing process is interrupted on the processor, a hard architected state, which has been pre-stored in the processor, of a next process is loaded into architected storage locations in the processor. The next process to be executed, and thus its corresponding hard architected state that is pre-stored in the processor, are determined based on priorities assigned to the waiting processes.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions and treats instructions in a different manner when they are loaded speculatively. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. The load requests are sent to the lower level cache when the upper level cache does not contain the value required by the load. If a speculative request is for an instruction which is likewise not present in the lower level cache, that request is ignored, keeping both the lower level and upper level caches free of speculative values that are infrequently used. If the value is present in the lower level cache, it is loaded into the upper level cache. If a speculative request is for operand data, the value is loaded only into the lower level cache if it is not already present, keeping the upper level cache free of speculative operand data.
摘要:
In a cache coherent data processing system including at least first and second coherency domains, a master performs a first broadcast of an operation within the cache coherent data processing system that is limited in scope of transmission to the first coherency domain. The master receives a response of the first coherency domain to the first broadcast of the operation. If the response indicates the operation cannot be serviced in the first coherency domain alone, the master increases the scope of transmission by performing a second broadcast of the operation in both the first and second coherency domains. If the response indicates the operation can be serviced in the first coherency domain, the master refrains from performing the second broadcast.