摘要:
A method and apparatus for performing load operations in a computer system. The present invention includes a method and apparatus for dispatching the load operation to be executed. The present invention halts the execution of the load operation when a dependency exists between the load operation and another memory operation currently pending in the system. When the dependency no longer exists, the present invention redispatches the load operation so that it completes.
摘要:
A translation lookaside buffer is described for use with a microprocessor capable of speculative and out-of-order processing of memory instructions. The translation lookaside buffer is non-blocking in response to translation lookaside buffer misses requiring page table walks. Once a translation lookaside buffer miss is detected, a page table walk is initiated to satisfy the miss. During the page table walk, additional memory instructions are processed by the translation lookaside buffer. Any additional instructions which cause translation lookaside buffer hits are merely processed by the translation lookaside buffer. However, instructions causing translation lookaside buffer misses while the page table walk is being performed are blocked pending completion of the page table walk. Once the page table walk is completed the blocked instructions are reawakened and are again processed by the translation lookaside buffer. Global and selective wakeup mechanisms are described. An implementation wherein the non-blocking translation lookaside buffer is provided within a microprocessor capable of speculative and out-of-order processing is also described.
摘要:
A number of data misalignment detection circuits are provided to select ones of an execution unit and memory order interface components, of an out-of-order (OOO) execution computer system, this aids the buffering and fault generation circuits of the memory order interface components, to buffer and dispatch load and store operations depending on if the misalignments are detected and their nature, resulting in load and store operations of misaligned data against a memory subsystem of the OOO execution computer system are available, the data addressed by the load and store operations are of the following types: chunk split, cache split, or page split misaligned.
摘要:
The data cache unit includes a separate fill buffer and a separate write-back buffer. The fill buffer stores one or more cache lines for transference into data cache banks of the data cache unit. The write-back buffer stores a single cache line evicted from the data cache banks prior to write-back to main memory. Circuitry is provided for transferring a cache line from the fill buffer into the data cache banks while simultaneously transferring a victim cache line from the data cache banks into the write-back buffer. Such allows the overall replace operation to be performed in only a single clock cycle. In a particular implementation, the data cache unit is employed within a microprocessor capable of speculative and out-of-order processing of memory instructions. Moreover, the microprocessor is incorporated within a multiprocessor computer system wherein each microprocessor is capable of snooping the cache lines of data cache units of each other microprocessor. The data cache unit is also a non-blocking cache.
摘要:
A number of identical matching circuits are integrated into the store address buffer, one matching circuit to each buffer slot, for generating a number of match signals, one for each detected match, using at most the entire source address of an instruction being fetched and the corresponding portions of the store destination addresses of the buffered store instructions. Additionally, a stall signal generator complimentary to the store address buffer is provided for generating a single stall signal for the bus controller, using the match signals, thereby stalling an instruction fetch from a source address that is potentially a store destination of one of the buffered store instructions with minimal performance cost.
摘要:
The present invention provides for executing store instructions with a processor. The present invention executes each of the store instructions by producing the data that is to be stored and by calculating the destination address to which the data is to be stored. In the present invention, the store instructions are executed to produce the destination address of the store instruction earlier than the prior art.
摘要:
A method and apparatus for performing load operations in a computer system. The present invention includes a method and apparatus for dispatching the load operation to be executed. The present invention halts the execution of the load operation when a dependency exists between the load operation and another memory operation currently pending in the system. When the dependency no longer exists, the present invention redispatches the load operation so that it completes.
摘要:
A memory-type value identifying the type of memory contained with a range of memory locations is explicitly stored within a microprocessor. Prior to processing a memory micro-instruction such as a load or store, the memory-type is determined for the memory location identified by the memory micro-instruction. Once the memory-type is known the memory micro-instruction is processed in accordance with any one of a number of processing protocols including write-through processing, write-back processing, write-protect processing, restricted-cacheability processing, uncacheable speculatable write-combining processing, or uncacheable processing. By providing memory-type information explicitly within the microprocessor, the type of memory identified by a micro-instruction is known before the micro-instruction is processed. Accordingly, the protocol by which the micro-instruction is processed may be efficiently tailored to the memory-type. For example, if the memory location identified by the micro-instruction is known to be uncacheable, a data cache unit is bypassed and external memory is accessed directly. In an exemplary embodiment, the microprocessor is an out-of-order microprocessor capable of generating speculative memory micro-instruction. Also, the microprocessor may be only one of a number of microprocessors within a multiprocessor system.
摘要:
A data cache and a plurality of companion fill buffers having corresponding tag matching circuitry are provided to a computer system. Each fill buffer independently stores and tracks a replacement cache line being filled with data returning from main memory in response to a cache miss. When the cache fill is completed, the replacement cache line is output for the cache tag and data arrays of the data cache if the memory locations are cacheable and the cache line has not been snoop hit while the cache fill was in progress. Additionally, the fill buffers are organized and provided with sufficient address and data ports as well as selectors to allow the fill buffers to respond to subsequent processor loads and stores, and external snoops that hit their cache lines while the cache fills are in progress. As a result, the cache tag and data arrays of the data cache can continue to serve subsequent processor loads and stores, and external snoops, while one or more cache fills are in progress, without ever having to stall the processor.
摘要:
A bypass mechanism within a register alias table unit (RAT) for handling source-destination dependencies between operands of a given set of operations issued simultaneously within a superscalar microproessor. Operations of the given set are presented in program order and data dependencies occur when a source register of particular operation is also utilizes as a destination register of a preceding operation within the given set of operations. At this occurence, the initial read of the RAT unit will not have supplied the most current rename of the source register. The present invention includes a comparison mechanism to detect this condition. Also included is a bypass mechanism for bypassing the physical source register output by the initial read of the RAT unit with a recently allocated physical destination register assigned to the preceding operation having the matched physical destination register. In general the RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set (such as the Intel architecture or PowerPC or Alpha designs, for instance) to eliminate false data dependencies that reduce overall superscalar processing performance for the microprocessor. The bypass mechanism of the present invention handles both floating point and integer registers and, in addition, a second bypass mechanism is included in the RAT priority write operation.