摘要:
An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.
摘要:
A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.
摘要:
Two independently accessible subdivided cache tag arrays and a cache control logic is provided to a set associative cache system. Each tag entry is stored in two subdivided cache tag arrays, a physical and a set tag array such that each physical tag array entry has a corresponding set tag array entry. Each physical tag array entry stores the tag addresses and control bits for a set of cache lines. The control bits comprise at least one validity bit indicating whether the data stored in the corresponding cache line is valid. Each set tag array entry stores the descriptive bits for a set of cache lines which consists of the most recently used (MRU) field identifying the most recently used cache lines of the cache set. Each subdivided tag array is provided with its own interface to enable each array to be accessed concurrently but independently by the cache control logic which performs read and write operations against the cache. The cache control logic makes concurrent and independent accesses to the separate tag arrays to read and write the control and descriptive information in the tag entries. The accesses are grouped by type of operation to be performed and each type of accesses is made during predesignated time slots in an optimized manner to enable the cache control logic to perform certain selected read/write accesses to the physical tag array while performing other selected independent read/write accesses to the set tag array concurrently.
摘要:
A method and apparatus for removing a page table entry from a plurality of translation lookaside buffers ("TLBs") in a multiprocessor computer system. The multiprocessor computer system includes at least two processors coupled to a packet-switched bus. Page table entries are removed from a plurality of TLBs in the multiprocessor computer system by first broadcasting a demap request packet on the packet-switched bus in response to one of the processors requesting that a page table entry be removed from its associated TLB. The demap request packet includes a virtual address and context information specifying this page table entry. Controllers reply to the demap request packet by sending a first reply packet to the controller that sent the original demap request packet to indicate receipt of the demap request packet. If a controller removes the page table entry from its associated TLB, that controller sends a second demap reply packet to indicate that the page table entry has been removed from its associated TLB.
摘要:
A caching arrangement which can work efficiently in a superscaler and multiprocessing environment includes separate caches for instructions and data and a single translation lookaside buffer (TLB) shared by them. During each clock cycle, retrievals from both the instruction cache and data cache may be performed, one on the rising edge of the clock cycle and one on the falling edge. The TLB is capable of translating two addresses per clock cycle. Because the TLB is faster than accessing the tag arrays which in turn are faster than addressing the cache data arrays, virtual addresses may be concurrently supplied to all three components and the retrieval made in one phase of a clock cycle. When an instruction retrieval is being performed, snooping for snoop broadcasts may be performed for the data cache and vice versa. Thus, for every clock cycle, an instruction and data cache retrieval may be performed as well as snooping.
摘要:
A cache array, a cache tag and comparator unit and a cache multiplexor are provided to a cache memory. Each cache operation performed against the cache array, read or write, takes only half a clock cycle. The cache tag and comparator unit comprises a cache tag array, a cache miss buffer and control logic. Each cache operation performed against the cache tag array, read or write, also takes only half a clock cycle. The cache miss buffer comprises cache miss descriptive information identifying the current state of a cache fill in progress. The control logic comprises a plurality of combinatorial logics for performing tag match operations. In addition to standard tag match operations, the control logic also conditionally tag matches an accessing address against an address tag stored in the cache miss buffer. Depending on the results of the tag match operations, and further depending on the state of the current cache fill if the accessing address is part of the memory block frame of the current cache fill, the control logic provides appropriate signals to the cache array, the cache multiplexor, the main memory and the instruction/data destination. As a result, subsequent instruction/data requests that are part of a current cache fill in progress can be satisfied without having to wait for the completion of the current cache fill, thereby further reducing cache miss penalties and function unit idle time.
摘要:
In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.
摘要:
A high performance microprocessor bus protocol for improving system throughput. The bus protocol enables overlapping read burst and write burst bus transactions to a cache, and interleaved bus transactions during external fetch cycles for missed cache lines. The bus protocol is implemented in a system comprising a CPU, and a secondary cache. The secondary cache comprises an SRAM array cache, and a cache controller. The CPU contains an instruction pipeline and a primary cache system.
摘要:
An apparatus and method for enabling a cache controller and address and data buses of a microprocessor with an on-board cache to provide a SRAM test mode for testing the on-board cache. Upon assertion of a SRAM test signal to a SRAM test pin on the microprocessor chip, the cache and bus controllers cease normal functionality and permit data to be written to, and read from, individual addresses within the on-board cache as though the on-board cache is simple SRAM. After the chip is reset, standard SRAM tests can then be implemented by reading and writing data to selected cache memory addresses as though the cache memory were SRAM. Upon completion of the tests, the SRAM test signal is deasserted and the cache and bus controllers resume normal operating functionality. A reset signal is then applied to the microprocessor to reinitialize control logic employed within the microprocessor. In this way, cache memory on-board a microprocessor can be tested using standard SRAM testing algorithms and equipment thereby eliminating a need for specialized test equipment to test cache memory contained on a microprocessor chip.
摘要:
A method and apparatus for maintaining cache coherency in a multiprocessor system having a plurality of processors and a shared main memory. Each of the plurality of processors is coupled to at least one cache unit and a store buffer. The method comprises the steps of writing by a first cache unit to its first store buffer a dirty line when the first cache unit experiences a cache miss; gaining control of the bus by the first cache unit; reading a new line from the share main memory by the first cache unit through the bus; writing the dirty line to the shared main memory if the bus is available to the first cache unit and if not available, the first cache unit checking snooping by a second cache unit from a second processor; comparing an address from the second cache unit with the tag of the dirty line, wherein the tag is stored in content-addressable memory coupled to the store buffer and if there is a hit, then supplying the dirty line to the second cache unit for updating.