摘要:
An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.
摘要:
A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.
摘要:
In a data cache unit that exchanges data signal groups with at least two execution units, the operation of the data cache unit is implemented as a three-stage pipeline in order to access data at the speed of the system clock. For a READ operation, virtual address components are applied to a storage cell bank unit implemented in SAM technology to begin access of the storage cells with the data signal group identified by the virtual address components. The virtual address components are also applied to a microtag unit, the microtag unit identifying a subgroup of the signal group identified by the address components. Simultaneously, the virtual address is formed from the two virtual address components and applied to a translation table unit, to a valid-bit array unit, and to a tag unit. The translation table unit and the tag unit determine whether the correct data signal subgroup identified by the address signal group is stored in the data cache memory unit. The selected data signal subgroup and a HIT/MISS signal are transmitted to the execution unit during the same cycle. For a WRITE operation, only two pipeline stages are required. In addition, the WRITE operation can involve the storage in the data cache memory of a single data signal group or a plurality of data signal groups. Because the storage cells are arranged in banks, simultaneous interaction by the two execution units is possible.
摘要:
Order indication logic can be recycled for at least two different data hazards, thus reducing the amount of processor real estate consumed by data hazard resolution logic. The logic also allows a single priority picker to be utilized for coloring without the cost of additional pipeline stages. A single priority picker can be utilized to identify memory operations for performing RAW bypass and for resolving OERs. For instance, a data hazard resolution unit resolves at least two different data hazards between resident memory operations and incoming memory operations with a set of logic that indicates order of the resident memory operations relative to the incoming memory operations. The indicated order corresponds to the data hazard being resolved. The data hazard resolution unit includes a priority picker to select one of the indicated resident memory operations for either data hazard.
摘要:
In a data cache unit that exchanges data signal groups with at least two execution units, the operation of the data cache unit is implemented as a three-stage pipeline in order to access data at the speed of the system clock. The data cache unit has a plurality of storage cell banks. Each storage cell bank has valid bit array unit and a tag unit for each execution unit incorporated therein. Each valid bit array unit has a valid/invalid storage cell associated with each data group stored in the storage cell bank. The valid bit array units have a read/write address port and snoop address port. During a read operation, the associated valid/invalid signal is retrieved to determine whether the data signal group should be processed by the associated execution unit. In a write operation, a valid bit is set in the valid/invalid bit location(s) associated with the storage of a data signal group (or groups) during memory access. The valid bit array unit responds to a snoop address and a control signal from the tag unit to set an invalid bit in a valid/invalid bit address location associated with the snoop address. The tag unit can be divided into a plurality of tag subunits to expedite processing.