摘要:
An instruction dispatch circuit is disclosed that improves instruction execution throughput for a processor. The instruction dispatch circuit comprises an instruction buffer with a plurality of instruction entries and a content addressable memory array having at least one cam entry corresponding to each instruction entry. Each cam entry stores at least one source tag for the corresponding instruction entry. The content addressable memory array matches to a result tag from an execution circuit over a result bus, wherein the execution circuit transfers the result tag over the result bus at least one clock cycle before transferring a corresponding result data value over the result bus. Each cam entry generates a cam match signal used to determine whether data dependent instruction are ready for dispatch.
摘要:
A mechanism for coordinating source data in a processor, wherein a decode circuit issues instructions comprising at least one immediate valid flag and at least one logical register source. The immediate valid flag indicates whether an immediate operand for the instruction is available on an immediate data bus, and the logical register source specifies a physical register or a committed state register. A speculative result data value and a speculative source valid flag are read from the physical register, and a committed result data value is read from the committed state register. The speculative result data value and the speculative source valid flag or the committed result data value and the committed source valid flag provide a source data value and a source data valid flag for scheduling an execution of the instruction.
摘要:
A register alias table unit (RAT) with an idiom recognition mechanism for overriding partial width conditions stalls is described. A partial width stall condition occurs during the RAT renaming process when a logical source register being renamed is larger than the corresponding physical source register pointed to by a renaming table. An idiom recognizer detects uops that zero their logical destination register and sets and clears zero bits in an iRAT array accordingly. The zero bits indicate which portions of an entry's physical source register are known to be zeros. A partial width stall override mechanism overrides a partial width stall condition when the zero bits for the physical source register causing the partial width stall indicate that the "missing" portion of the physical source register contains zeros. The performance of a microprocessor implementing such a RAT renaming mechanism with an idiom recognizer is improved because common partial width stalls are avoided.
摘要:
A circuit comprising a number of range registers and complimentary decoding/matching circuits is provided to a processor for determining the memory type of a physical address, thereby allowing memory operating characteristics to be determined as soon as the physical address is available in an execution stage preceding cache access. Additionally, a memory type field is provided to each address translation lookaside buffer entry of the data and instruction memory subsystem for storing the determined memory type, and the memory type determination circuit is disposed in the page miss handler, thereby allowing memory type to be determined at the same time when the physical address is determined.
摘要:
A mechanism and method for renaming flags within a register alias table ("RAT") to increase processor parallelism and also providing and using flag masks associated with individual instructions. In order to reduce the amount of data dependencies between instructions that are concurrently processed, the flags used by these instructions are renamed. In general, a RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set (such as the Intel architecture or PowerPC or Alpha designs, for instance) to eliminate false data dependencies between instructions that reduce overall superscalar processing performance for the microprocessor. The renamed flag registers contain several flag bits and various flag bits may be updated or read by different instructions. Also, static and dynamic flag masks are associated with particular instructions and indicate which flags are capable of being updated by a particular instruction and also indicate which flags are actually updated by the instruction. Static flag masks are used in flag renaming and dynamic flag masks are used at retirement. The invention also discovers cases in which a flag register is required that is a superset of the previously renamed flag register portion.
摘要:
Maximum throughput or "back-to-back" scheduling of dependent instructions in a pipelined processor is achieved by maximizing the efficiency in which the processor determines the availability of the source operands of a dependent instruction and provides those operands to an execution unit executing the dependent instruction. These two operations are implemented through number of mechanisms. One mechanism for determining the availability of source operands, and hence the readiness of a dependent instruction for dispatch to an available execution unit, relies on the prospective determination of the availability of a source operand before the operand itself is actually computed as a result of the execution of another instruction. Storage addresses of the source operands of an instruction are stored in a content addressable memory (CAM). Before an instruction is executed and its result data written back, the storage location address of the result is provided to the CAM and associatively compared with the source operand addresses stored therein. A CAM match and its accompanying match bit indicate that the result of the instruction to be executed will provide a source operand to the dependent instruction waiting in the reservation station. Using a bypass mechanism, if the operand is computed after dispatch of the dependent instruction, then the source operand is provided directly from the execution unit computing the source operand to a source operand input of the execution unit executing the dependent instruction.
摘要:
A high byte right-shift detection mechanism with a register alias table unit (RAT) for selectively causing right-shifting of high byte physical source register data before operations are executed within a microprocessor is described. A high byte right-shift condition occurs when a logical source register that is presented to the RAT for renaming is a high byte register and the corresponding physical source register selected by the RAT is not right-adjusted. A non right-adjusted physical source register is detected when either the physical source register is an architectural state register or the physical source register is a larger width register that includes the renamed high byte register. The high byte right-shift detection mechanism detects a high byte shirt-right condition when a logical source register is renamed and generates shift bits and zero extend bits to control the right-shifting and zero extending of the data in the correspondingly renamed physical source register before execution by an execution unit that assumes right-adjusted input data. Right-adjusted result data from the execution unit is stored in a physical destination register (a speculative state register) in the re-order buffer (ROB) until retirement. If the RAT renames another high byte logical source register to source that physical destination register before the register retires, right-shifting of the physical destination register data is not required because the data is already right-adjusted. At retirement, physical destination register data corresponding to a high byte logical destination register is left-shifted and stored in the high byte register of a non-speculative state register in the retirement register file (RRF).
摘要:
A bypass mechanism within a register alias table unit (RAT) for handling source-destination dependencies between operands of a given set of operations issued simultaneously within a superscalar microproessor. Operations of the given set are presented in program order and data dependencies occur when a source register of particular operation is also utilizes as a destination register of a preceding operation within the given set of operations. At this occurence, the initial read of the RAT unit will not have supplied the most current rename of the source register. The present invention includes a comparison mechanism to detect this condition. Also included is a bypass mechanism for bypassing the physical source register output by the initial read of the RAT unit with a recently allocated physical destination register assigned to the preceding operation having the matched physical destination register. In general the RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set (such as the Intel architecture or PowerPC or Alpha designs, for instance) to eliminate false data dependencies that reduce overall superscalar processing performance for the microprocessor. The bypass mechanism of the present invention handles both floating point and integer registers and, in addition, a second bypass mechanism is included in the RAT priority write operation.
摘要:
A partial width stall mechanism within a register alias table unit (RAT) for handling partial width data dependencies of a given set of operations issued simultaneously within a superscalar microprocessor. Operations of the given set are presented to the RAT in program order and partial width data dependencies occur when the size of a logical source register that is presented to the RAT for renaming to a corresponding physical source register is larger than the corresponding physical source register selected by the RAT. At this occurrence, the data required by the logical source register to be renamed does not reside in any one physical source register. Therefore, renaming of that logical register must be stalled until the data for that logical register is accumulated into one location. The data will be so accumulated when the last operation to have written the physical source register is retired and is, therefore, nonspeculative. The present invention includes a size comparison mechanism to detect the partial width stall condition. Also included is a partial width stall mechanism for preventing the renaming process from operating when the partial width stall condition is detected. In general the RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set to eliminate false data dependencies that would otherwise reduce overall superscalar processing performance for the microprocessor.
摘要:
A data processor includes a plurality of physical registers and a decoder that decodes a stream of instructions into micro-operations which include speculative operations specifying associated logical registers. The data processor further includes a register-alias table having a plurality of addressable entries corresponding to logical registers, specified by the speculative operations. Each entry of the register-alias table contains a register pointer to a corresponding physical register. The processor further includes a retirement register file that maintains register values of non-speculative operations, and a retirement array that maintains a retirement ordering for the retirement register file. Both the register-alias table and retirement array are updated by circuitry that is responsive to a register exchange operation; the circuitry swapping register pointers associated with first and second entries, respectively.