摘要:
A plurality of indexes are provided for a multi-way set-associate cache of a computer system. The cache is organized as a plurality of blocks for storing data which are a copies of main memory data. Each block has an associated tag for uniquely identifying the block. The blocks and the tags are addressed by indexes. The indexes are generated by a Boolean hashing function which converts a memory address to cache indexes by combining the bits of the memory address using an exclusive OR function. Different combination of bits are used to generate a plurality of different indexes to address the tags and the associated blocks to transfer data between the cache and the central processing unit of the computer system.
摘要:
The set-prediction cache memory system comprises an extension of a set-associative cache memory system which operates in parallel to the set-associative structure to increase the overall speed of the cache memory while maintaining its performance. The set prediction cache memory system includes a plurality of data RAMs and a plurality of tag RAMs to store data and data tags, respectively. Also included in the system are tag store comparators to compare the tag data contained in a specific tag RAM location with a second index comprising a predetermined second portion of a main memory address. The elements of the set prediction cache memory system which operate in parallel to the set-associative cache memory include: a set-prediction RAM which receives at least one third index comprising a predetermined third portion of the main memory address, and stores such third index to essentially predict the data cache RAM holding the data indexed by the third index; a data-select multiplexer which receives the prediction index and selects a data output from the data cache RAM indexed by the prediction index; and a mispredict logic device to determine if the set prediction RAM predicted the correct data cache RAM and if not, issue a mispredict signal which may comprise a write data signal, the write data signal containing information intended to correct the prediction index contained in the set prediction RAM.
摘要:
An apparatus may comprise a cache file having a plurality of cache lines and a hit predictor. The hit predictor may contain a table of counter values indexed with signatures that are associated with the plurality of cache lines. The apparatus may fill cache lines into the cache file with either low or high priority. Low priority lines may be chosen to be replaced by a replacement algorithm before high priority lines. In this way, the cache naturally may contain more high priority lines than low priority ones. This priority filling process may improve the performance of most replacement schemes including the best known schemes which are already doing better than LRU.
摘要:
One disclosed embodiment may comprise a system that includes a home node that provides a transaction reference to a requester in response to a request from the requester. The requester provides an acknowledgement message to the home node in response to the transaction reference, the transaction reference enabling the requester to determine an order of requests at the home node relative to the request from the requester.
摘要:
Multiprocessor systems and methods are disclosed. One embodiment may comprise a plurality of processor cores. A given processor core may be operative to generate a request for desired data in response to a cache miss at a local cache. A shared cache structure may provide at least one speculative data fill and a coherent data fill of the desired data to at least one of the plurality of processor cores in response to a request from the at least one processor core. A processor scoreboard arbitrates the requests for the desired data. A speculative data fill of the desired data is provided to the at least one processor core. The coherent data fill of the desired data may be provided to the at least one processor core in a determined order.
摘要:
A system comprises a first node including data having an associated D-state and a second node operative to provide a source broadcast requesting the data. The first node is operative in response to the source broadcast to provide the data to the second node and transition the state associated with the data at the first node from the D-state to an O-state without concurrently updating memory. An S-state is associated with the data at the second node.
摘要:
A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level. Since the address of the invalidated cache line was left in the respective tag store, a match will be detected at one of the cache levels, although the corresponding state of this cache line is invalid. This condition is specifically detected and is considered to be an “inval_miss” occurrence. In response, to an inval_miss, the cache hierarchy calls off searching any higher levels, and instead, issues a memory reference request for the desired cache line. In a further embodiment, the entity that sourced an invalidate request is stored, and a subsequent memory reference request for the same cache line is sent directly to the source entity.
摘要:
A technique for predicting the result of a conditional branch instruction for use with a processor having instruction pipeline. A stored predictor is connected to the front end of the pipeline and is trained from a truth based predictor connected to the back end of the pipeline. The stored predictor is accessible in one instruction cycle, and therefore provides minimum predictor latency. Update latency is minimized by storing multiple predictions in the front end stored predictor which are indexed by an index counter. The multiple predictions, as provided by the back end, are indexed by the index counter to select a particular one as current prediction on a given instruction pipeline cycle. The front end stored predictor also passes along to the back end predictor, such as through the instruction pipeline, a position value used to generate the predictions. This further structure accommodates ghost branch instructions that turn out to be flushed out of the pipeline when it must be backed up. As a result, the front end always provides an accurate prediction with minimum update latency.
摘要:
A next line prediction mechanism for predicting a next instruction index to an instruction cache of a computer pipeline, has a latency equal to the cycle time of the instruction cache to maximize the instruction bandwidth out of the instruction cache. The instruction cache outputs a block of instructions with each fetch initiated by a next instruction index provided by the line prediction mechanism. The instructions of the block are processed in parallel for instruction decode and branch prediction to maintain a high rate of instruction flow through the pipeline.
摘要:
A register map having a free list of available physical locations in a register file, a log containing a sequential listing of logical registers changed during a predetermined number of cycles, a back-up map associating the logical registers with corresponding physical homes at a back-up point in a computer pipeline operation and a predicted map associating the logical registers with corresponding physical homes at a current point in the computer pipeline operation. A set of valid bits is associated with the maps to indicate whether a particular logical register is to be taken from the back-up map or the predicted map indication of a corresponding physical home. The valid bits can be "flash cleared" in a single cycle to back-up the computer pipeline to the back-up point during a trap event.