摘要:
A microprocessor for providing an extended linear address of more than 32 bits. The extended linear address may be provided by concatenating a linear address with a segment selector extension, or by concatenating the values from two registers. Hierarchical translation of a linear address to a physical address is performed in which the number of levels in the hierarchy depends upon whether the linear address is an extended linear address.
摘要:
An allocator assigns entries for a circular buffer. The allocator receives requests for storing data in entries of the circular buffer, and generates a head pointer to identify a starting entry in the circular buffer for which circular buffer entries are not allocated. In addition to pointing to an entry in the circular buffer, the head pointer includes a wrap bit. The allocator toggles the wrap bit each time the allocator traverses the linear queue of the circular buffer. A tail pointer is generated, including the wrap bit, to identify an ending entry in the circular buffer for which circular buffer entries are allocated. In response to the request for entries, the allocator sequentially assigns entries for the requests located between the head pointer and the tail pointer. The allocator has application for use in a microprocessor performing out-of-order dispatch and speculative execution. The allocator is coupled to a reorder buffer, configured as a circular buffer, to permit allocation of entries. The allocator utilizes an all or nothing allocation policy, such that either all or no incoming instructions are allocated during an allocation period.
摘要:
A non-blocking translation lookaside buffer is described for use in a microprocessor capable of processing speculative and out-of-order instructions. Upon the detection of a fault, either during a translation lookaside buffer hit or a page table walk performed in response to a translation lookaside buffer miss, information associated with the faulting instruction is stored within a fault register within the translation lookaside buffer. The stored information includes the linear address of the instruction and information identifying the age of instruction. In addition to storing the information within the fault register, a portion of the information is transmitted to a reordering buffer of the microprocessor for storage therein pending retirement of the faulting instruction. Prior to retirement of the faulting instruction, the translation lookaside buffer continues to process further instructions. Upon retirement of each instruction, the reordering buffer determines whether a fault had been detected for that instruction and, if so, the microprocessor is flushed. Then, a branch is taken into microcode. The microcode accesses the linear address and other information stored within the fault register of the translation lookaside buffer and handles the fault. The system is flushed and the microcode is executed only for faulting instructions which actually retire. As such, faults detected while processing speculative instructions based upon mispredicted branches do not prevent further address translations and do not cause the system to be flushed. Method and apparatus implementations are described herein.
摘要:
A register alias table unit (RAT) with an idiom recognition mechanism for overriding partial width conditions stalls is described. A partial width stall condition occurs during the RAT renaming process when a logical source register being renamed is larger than the corresponding physical source register pointed to by a renaming table. An idiom recognizer detects uops that zero their logical destination register and sets and clears zero bits in an iRAT array accordingly. The zero bits indicate which portions of an entry's physical source register are known to be zeros. A partial width stall override mechanism overrides a partial width stall condition when the zero bits for the physical source register causing the partial width stall indicate that the "missing" portion of the physical source register contains zeros. The performance of a microprocessor implementing such a RAT renaming mechanism with an idiom recognizer is improved because common partial width stalls are avoided.
摘要:
A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.
摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the prospective determination of the availability of a source operand before the operand itself is actually computed as a result of the execution of another instruction. Storage addresses of the source operands of an instruction are stored in a content addressable memory (CAM). Before an instruction is executed and its result data written back, the storage location address of the result is provided to the CAM and associatively compared with the source operand addresses stored therein. A CAM match and its accompanying match bit indicate that the result of the instruction to be executed will provide a source operand to the dependent instruction waiting in the reservation station. Using a bypass mechanism, if the operand is computed after dispatch of the dependent instruction, then the source operand is provided directly from the execution unit computing the source operand to a source operand input of the execution unit executing the dependent instruction.
摘要:
A method and circuitry for coordinating exceptions in a processor. The processor generates a result data value and an exception data value for each instruction wherein the exception data value specifies whether the corresponding instruction causes an exception. The processor commits the result data values to an architectural state of the processor in the sequential program order, and fetches an exception handler to processes the exception if the exception is indicated by one of the exception data values. The processor fetches an asynchronous event handler to processes an asynchronous event if the asynchronous event is detected while the result data values are committed to the architectural state of the processor.
摘要:
A speculative execution out of order processor comprising a reorder circuit containing a plurality of physical registers that buffer speculative execution results for integer and floating-point operations, and a real register circuit containing a plurality of committed state registers that buffer committed execution results for either integer or floating-point operations, depending on the register. The reorder and real register circuits read the speculative and committed source data values for incoming micro-ops, and transfer the speculative and committed source data values over to a micro-op dispatch circuit over a common data path. A retire logic circuit commits the speculative execution results to an architectural state by transferring the speculative execution results from the reorder circuit to the real register circuit.
摘要:
Pipeline lengthening in functional units likely to be involved in a writeback conflict is implemented to avoid conflicts. Logic circuitry is provided for comparing the depths of two concurrently executing execution unit pipelines to determine if a conflict will develop. When it appears that two execution units will attempt to write back at the same time, the execution unit having a shorter pipeline will be instructed to add a stage to its pipeline, storing its result in a delaying buffer for one clock cycle. After the conflict has been resolved, the instruction to lengthen the pipeline of a given functional unit will be rescinded. Multistage execution units are designed to signal a reservation station to delay the dispatch of various instructions to avoid conflicts between execution units.
摘要:
A mechanism for indicating within a register alias table (RAT) that certain data has become architecturally visible so that the RAT contains the most recent location of the certain data. Upon receiving the indication that data associated with a particular register is architecturally visible, if a subsequent operation uses the particular register as a source, the data will be supplied from the architecturally visible buffer instead of from an internal buffer (not architecturally visible). The internal buffer is implemented by a reorder buffer (ROB) which contains information associated with instructions that have not yet retired. The architecturally visible buffer is a retirement register file (RRF) which contains information associated with retired instructions. When an instruction retires, the register alias table is searched for the retiring physical register and will indicate within the register alias table that the data associated with the retiring physical register is located within the RRF only if the register alias table has not already (or concurrently) reassigned a new physical register to the logical register associated with the retiring physical register. If the logical register associated with the retiring physical register as been reassigned by subsequent instructions, then no update of the register alias table is required. Also provided is an embodiment for providing the above features in a system wherein the register ordering of the buffers can be altered via register exchange operations.