摘要:
A computer system having a mechanism for maintaining processor ordering during out-of-order instruction execution is disclosed wherein load memory instructions are accessed according to program order and executed out-of-order in relation to the program order where appropriate. Processors in the system snoop an external bus for bus transactions that conflict with completed load memory instructions before committing results of the completed load memory instructions to an architectural state.
摘要:
A data cache and a plurality of companion fill buffers having corresponding tag matching circuitry are provided to a computer system. Each fill buffer independently stores and tracks a replacement cache line being filled with data returning from main memory in response to a cache miss. When the cache fill is completed, the replacement cache line is output for the cache tag and data arrays of the data cache if the memory locations are cacheable and the cache line has not been snoop hit while the cache fill was in progress. Additionally, the fill buffers are organized and provided with sufficient address and data ports as well as selectors to allow the fill buffers to respond to subsequent processor loads and stores, and external snoops that hit their cache lines while the cache fills are in progress. As a result, the cache tag and data arrays of the data cache can continue to serve subsequent processor loads and stores, and external snoops, while one or more cache fills are in progress, without ever having to stall the processor.
摘要:
A data cache and a plurality of companion fill buffers having corresponding tag matching circuitry are provided to a computer system. Each fill buffer independently stores and tracks a replacement cache line being filled with data returning from main memory in response to a cache miss. When the cache fill is completed, the replacement cache line is output for the cache tag and data arrays of the data cache if the memory locations are cacheable and the cache line has not been snoop hit while the cache fill was in progress. Additionally, the fill buffers are organized and provided with sufficient address and data ports as well as selectors to allow the fill buffers to respond to subsequent processor loads and stores, and external snoops that hit their cache lines while the cache fills are in progress. As a result, the cache tag and data arrays of the data cache can continue to serve subsequent processor loads and stores, and external snoops, while one or more cache fills are in progress, without ever having to stall the processor.
摘要:
A macro instruction is provided for a microprocessor which allows a programmer to specify a base value, index, scale factor and displacement value for calculating an effective address and returning that result in a single clock cycle. The macro instruction is converted into a micro operation which is provided to the single-cycle execution unit with the required source operands for performing the calculation. Within the single-cycle execution unit, the index and scale factor are provided to a left shifter for multiplying the two values. The result of the left shift operation is added to the sum of the base and displacement. This results in the effective address which is then returned from the single-cycle execution unit to a predetermined destination. This provides for the calculation of an effective address in a single cycle pipeline execution unit that is independent of the memory system execution units.
摘要:
A method and apparatus for performing operations with a processor in a computer system. Load operations are performed by use of a dispatch pipeline and a memory execution pipeline. The dispatch pipeline dispatches the load operation for execution by the processor, while the memory execution pipeline controls the execution of the load operation to memory. The present invention reduces the latency involved in executing a load operation by coupling the execution of the two pipelines during execution of the load operation.
摘要:
A method and apparatus for performing store operations that includes calculating the address and obtaining the data for the store operation. The address represents the memory location to which the data is to be stored. Once the address is calculated and the data obtained, the store operation is committed to processor state. The store operation may be dispatched to memory to complete the execution of the store operation.
摘要:
In an out-of-order execution computer system, a store buffer is conditionally signaled to output buffered store data of buffered memory store operations, when a buffered memory load operation is being executed. The determination on whether to signal the store buffer or not is made using control information that includes the allocation state of the store buffer at the time the memory load operation being executed was issued. The allocation state includes the identification of the buffer slot storing the last memory store operation stored into the store buffer, and the wraparound state of a circular wraparound allocation approach employed to allocate buffer slots to the memory store operations, at the time the memory load operation being executed was issued.
摘要:
Store forwarding circuitry is provided to an out-of-order execution processor having a store buffer of buffered memory store operations. The store forwarding circuitry conditionally forwards store data for a memory load operation from a variable subset of the buffered memory store operations that is functionally dependent on the time the memory load operation is issued, taking into account the execution states of these buffered memory store operations. The memory load operation may be issued speculatively and/or executed out-of-order. The execution states of the buffered memory store operations may be speculatively executed or committed. The data and address aspects of the memory store operations may be executed separately.
摘要:
A number of data misalignment detection circuits are provided to select ones of an execution unit and memory order interface components, of an out-of-order (OOO) execution computer system, this aids the buffering and fault generation circuits of the memory order interface components, to buffer and dispatch load and store operations depending on if the misalignments are detected and their nature, resulting in load and store operations of misaligned data against a memory subsystem of the OOO execution computer system are available, the data addressed by the load and store operations are of the following types: chunk split, cache split, or page split misaligned.
摘要:
A number of identical matching circuits are integrated into the store address buffer, one matching circuit to each buffer slot, for generating a number of match signals, one for each detected match, using at most the entire source address of an instruction being fetched and the corresponding portions of the store destination addresses of the buffered store instructions. Additionally, a stall signal generator complimentary to the store address buffer is provided for generating a single stall signal for the bus controller, using the match signals, thereby stalling an instruction fetch from a source address that is potentially a store destination of one of the buffered store instructions with minimal performance cost.