摘要:
A method and apparatus for performing operations with a processor in a computer system. Load operations are performed by use of a dispatch pipeline and a memory execution pipeline. The dispatch pipeline dispatches the load operation for execution by the processor, while the memory execution pipeline controls the execution of the load operation to memory. The present invention reduces the latency involved in executing a load operation by coupling the execution of the two pipelines during execution of the load operation.
摘要:
In an out-of-order execution computer system, a store buffer is conditionally signaled to output buffered store data of buffered memory store operations, when a buffered memory load operation is being executed. The determination on whether to signal the store buffer or not is made using control information that includes the allocation state of the store buffer at the time the memory load operation being executed was issued. The allocation state includes the identification of the buffer slot storing the last memory store operation stored into the store buffer, and the wraparound state of a circular wraparound allocation approach employed to allocate buffer slots to the memory store operations, at the time the memory load operation being executed was issued.
摘要:
Store forwarding circuitry is provided to an out-of-order execution processor having a store buffer of buffered memory store operations. The store forwarding circuitry conditionally forwards store data for a memory load operation from a variable subset of the buffered memory store operations that is functionally dependent on the time the memory load operation is issued, taking into account the execution states of these buffered memory store operations. The memory load operation may be issued speculatively and/or executed out-of-order. The execution states of the buffered memory store operations may be speculatively executed or committed. The data and address aspects of the memory store operations may be executed separately.
摘要:
The present invention provides for executing load instructions with a processor having a non-blocking cache memory, wherein individual load operations are dispatched to the cache memory and the cache memory signals the prevent the load operation from being sent to external memory when the load operation misses the cache memory and there is already a currently pending bus cycle to the same cache line. This helps reduce bus traffic on the external bus.
摘要:
A method and apparatus for speculatively dispatching and/or executing LOADs in a computer system includes a memory subsystem of a out-of-order processor that handles LOAD and STORE operations by dispatching them to respective LOAD and STORE buffers in the memory subsystem. When a LOAD is subsequently dispatched for execution, the store buffer is searched for STOREs having unknown addresses. If any STOREs are found which are older than the dispatched LOAD, and which have an unknown address, the LOAD is tagged with an unknown STORE address identification (USAID). When a STORE is dispatched for execution, the LOAD buffer is searched for loads that have been denoted as mis-speculated loads. Mis-speculated loads are prevented from corrupting the architectural state of the machine with invalid data.
摘要:
A non-blocking translation lookaside buffer is described for use in a microprocessor capable of processing speculative and out-of-order instructions. Upon the detection of a fault, either during a translation lookaside buffer hit or a page table walk performed in response to a translation lookaside buffer miss, information associated with the faulting instruction is stored within a fault register within the translation lookaside buffer. The stored information includes the linear address of the instruction and information identifying the age of instruction. In addition to storing the information within the fault register, a portion of the information is transmitted to a reordering buffer of the microprocessor for storage therein pending retirement of the faulting instruction. Prior to retirement of the faulting instruction, the translation lookaside buffer continues to process further instructions. Upon retirement of each instruction, the reordering buffer determines whether a fault had been detected for that instruction and, if so, the microprocessor is flushed. Then, a branch is taken into microcode. The microcode accesses the linear address and other information stored within the fault register of the translation lookaside buffer and handles the fault. The system is flushed and the microcode is executed only for faulting instructions which actually retire. As such, faults detected while processing speculative instructions based upon mispredicted branches do not prevent further address translations and do not cause the system to be flushed. Method and apparatus implementations are described herein.
摘要:
An allocator assigns entries for a circular buffer. The allocator receives requests for storing data in entries of the circular buffer, and generates a head pointer to identify a starting entry in the circular buffer for which circular buffer entries are not allocated. In addition to pointing to an entry in the circular buffer, the head pointer includes a wrap bit. The allocator toggles the wrap bit each time the allocator traverses the linear queue of the circular buffer. A tail pointer is generated, including the wrap bit, to identify an ending entry in the circular buffer for which circular buffer entries are allocated. In response to the request for entries, the allocator sequentially assigns entries for the requests located between the head pointer and the tail pointer. The allocator has application for use in a microprocessor performing out-of-order dispatch and speculative execution. The allocator is coupled to a reorder buffer, configured as a circular buffer, to permit allocation of entries. The allocator utilizes an all or nothing allocation policy, such that either all or no incoming instructions are allocated during an allocation period.
摘要:
A register alias table unit (RAT) with an idiom recognition mechanism for overriding partial width conditions stalls is described. A partial width stall condition occurs during the RAT renaming process when a logical source register being renamed is larger than the corresponding physical source register pointed to by a renaming table. An idiom recognizer detects uops that zero their logical destination register and sets and clears zero bits in an iRAT array accordingly. The zero bits indicate which portions of an entry's physical source register are known to be zeros. A partial width stall override mechanism overrides a partial width stall condition when the zero bits for the physical source register causing the partial width stall indicate that the "missing" portion of the physical source register contains zeros. The performance of a microprocessor implementing such a RAT renaming mechanism with an idiom recognizer is improved because common partial width stalls are avoided.
摘要:
A method and circuitry for coordinating exceptions in a processor. The processor generates a result data value and an exception data value for each instruction wherein the exception data value specifies whether the corresponding instruction causes an exception. The processor commits the result data values to an architectural state of the processor in the sequential program order, and fetches an exception handler to processes the exception if the exception is indicated by one of the exception data values. The processor fetches an asynchronous event handler to processes an asynchronous event if the asynchronous event is detected while the result data values are committed to the architectural state of the processor.
摘要:
A speculative execution out of order processor comprising a reorder circuit containing a plurality of physical registers that buffer speculative execution results for integer and floating-point operations, and a real register circuit containing a plurality of committed state registers that buffer committed execution results for either integer or floating-point operations, depending on the register. The reorder and real register circuits read the speculative and committed source data values for incoming micro-ops, and transfer the speculative and committed source data values over to a micro-op dispatch circuit over a common data path. A retire logic circuit commits the speculative execution results to an architectural state by transferring the speculative execution results from the reorder circuit to the real register circuit.