摘要:
A method and apparatus for speculatively dispatching and/or executing LOADs in a computer system includes a memory subsystem of a out-of-order processor that handles LOAD and STORE operations by dispatching them to respective LOAD and STORE buffers in the memory subsystem. When a LOAD is subsequently dispatched for execution, the store buffer is searched for STOREs having unknown addresses. If any STOREs are found which are older than the dispatched LOAD, and which have an unknown address, the LOAD is tagged with an unknown STORE address identification (USAID). When a STORE is dispatched for execution, the LOAD buffer is searched for loads that have been denoted as mis-speculated loads. Mis-speculated loads are prevented from corrupting the architectural state of the machine with invalid data.
摘要:
A method and apparatus for performing operations with a processor in a computer system. Load operations are performed by use of a dispatch pipeline and a memory execution pipeline. The dispatch pipeline dispatches the load operation for execution by the processor, while the memory execution pipeline controls the execution of the load operation to memory. The present invention reduces the latency involved in executing a load operation by coupling the execution of the two pipelines during execution of the load operation.
摘要:
In an out-of-order execution computer system, a store buffer is conditionally signaled to output buffered store data of buffered memory store operations, when a buffered memory load operation is being executed. The determination on whether to signal the store buffer or not is made using control information that includes the allocation state of the store buffer at the time the memory load operation being executed was issued. The allocation state includes the identification of the buffer slot storing the last memory store operation stored into the store buffer, and the wraparound state of a circular wraparound allocation approach employed to allocate buffer slots to the memory store operations, at the time the memory load operation being executed was issued.
摘要:
Store forwarding circuitry is provided to an out-of-order execution processor having a store buffer of buffered memory store operations. The store forwarding circuitry conditionally forwards store data for a memory load operation from a variable subset of the buffered memory store operations that is functionally dependent on the time the memory load operation is issued, taking into account the execution states of these buffered memory store operations. The memory load operation may be issued speculatively and/or executed out-of-order. The execution states of the buffered memory store operations may be speculatively executed or committed. The data and address aspects of the memory store operations may be executed separately.
摘要:
A computer system having a mechanism for maintaining processor ordering during out-of-order instruction execution is disclosed wherein load memory instructions are accessed according to program order and executed out-of-order in relation to the program order where appropriate. Processors in the system snoop an external bus for bus transactions that conflict with completed load memory instructions before committing results of the completed load memory instructions to an architectural state.
摘要:
A data cache and a plurality of companion fill buffers having corresponding tag matching circuitry are provided to a computer system. Each fill buffer independently stores and tracks a replacement cache line being filled with data returning from main memory in response to a cache miss. When the cache fill is completed, the replacement cache line is output for the cache tag and data arrays of the data cache if the memory locations are cacheable and the cache line has not been snoop hit while the cache fill was in progress. Additionally, the fill buffers are organized and provided with sufficient address and data ports as well as selectors to allow the fill buffers to respond to subsequent processor loads and stores, and external snoops that hit their cache lines while the cache fills are in progress. As a result, the cache tag and data arrays of the data cache can continue to serve subsequent processor loads and stores, and external snoops, while one or more cache fills are in progress, without ever having to stall the processor.
摘要:
A data cache and a plurality of companion fill buffers having corresponding tag matching circuitry are provided to a computer system. Each fill buffer independently stores and tracks a replacement cache line being filled with data returning from main memory in response to a cache miss. When the cache fill is completed, the replacement cache line is output for the cache tag and data arrays of the data cache if the memory locations are cacheable and the cache line has not been snoop hit while the cache fill was in progress. Additionally, the fill buffers are organized and provided with sufficient address and data ports as well as selectors to allow the fill buffers to respond to subsequent processor loads and stores, and external snoops that hit their cache lines while the cache fills are in progress. As a result, the cache tag and data arrays of the data cache can continue to serve subsequent processor loads and stores, and external snoops, while one or more cache fills are in progress, without ever having to stall the processor.
摘要:
A macro instruction is provided for a microprocessor which allows a programmer to specify a base value, index, scale factor and displacement value for calculating an effective address and returning that result in a single clock cycle. The macro instruction is converted into a micro operation which is provided to the single-cycle execution unit with the required source operands for performing the calculation. Within the single-cycle execution unit, the index and scale factor are provided to a left shifter for multiplying the two values. The result of the left shift operation is added to the sum of the base and displacement. This results in the effective address which is then returned from the single-cycle execution unit to a predetermined destination. This provides for the calculation of an effective address in a single cycle pipeline execution unit that is independent of the memory system execution units.
摘要:
A method and apparatus for performing store operations that includes calculating the address and obtaining the data for the store operation. The address represents the memory location to which the data is to be stored. Once the address is calculated and the data obtained, the store operation is committed to processor state. The store operation may be dispatched to memory to complete the execution of the store operation.
摘要:
The present invention provides for executing load instructions with a processor having a non-blocking cache memory, wherein individual load operations are dispatched to the cache memory and the cache memory signals the prevent the load operation from being sent to external memory when the load operation misses the cache memory and there is already a currently pending bus cycle to the same cache line. This helps reduce bus traffic on the external bus.