摘要:
A superscalar microprocessor configured to speculatively generate register values associated with a particular register is provided. Multiple register values are generated in parallel, wherein each speculatively generated register value accounts for modifications of the register value by each of the instructions prior to the instruction for which the register value is generated. Instructions which are dependent upon each other for the register values thus generated may be executed concurrently. In one specific embodiment, the present microprocessor generates register values for the ESP register. The speculatively generated register value resulting from the modifications performed by the instructions decoded during a clock cycle is stored in a speculative register file along with constants used to generate the register value associated with each individual instruction. When a mispredicted branch instruction is detected, the register value generated during the decode of the mispredicted branch instruction may be adjusted using the stored constants. The adjustment performed reflects the value of the register at the execution of the mispredicted branch instruction.
摘要:
A superscalar microprocessor predecodes instruction data to identify the boundaries of instructions and the type of instruction. In one embodiment, to expedite the dispatch of instructions, the first microcode instruction of the cache line is identified during predecode and stored as a microcode pointer. When the cache line is scanned for dispatch, the microcode pointer is used to identify the first microcode instruction which is conveyed to the MROM unit. In another embodiment, the first scanned instruction is predicted to be a microcode instruction and is dispatched to the MROM unit. A microcode scan circuit uses the microcode pointer and the functional bits of the predecode data to multiplex instruction specific bytes of the first microcode instruction to the MROM unit. If the predicted first microcode instruction is not the actual first microcode instruction, then in a subsequent clock cycle, the actual microcode instruction is dispatched the MROM unit and the incorrectly predicted microcode instruction is canceled.
摘要:
A return prediction unit is provided which is configured to predict return addresses for return instructions according to a return stack storage included therein. The return stack storage is a stack structure configured to store return addresses associated with previously detected call instructions. Return addresses may be predicted for return instructions early in the instruction processing pipeline of the microprocessor. In one embodiment, the return stack storage additionally stores a call tag and a return tag with each return address. The call tag and return tag respectively identify call and return instructions associated with the return address. These tags may be compared to a branch tag conveyed to the return prediction unit upon detection of a branch misprediction. The results of the comparisons may be used to adjust the contents of the return stack storage with respect to the misprediction. The return prediction unit may continue to predict return addresses correctly following a mispredicted branch instruction.
摘要:
A microprocessor is provided which is configured to predict return addresses for return instructions according to a return stack storage included therein. The return stack storage is a stack structure configured to store return addresses associated with previously detected call instructions. Return addresses may be predicted for return instructions early in the instruction processing pipeline of the microprocessor. In one embodiment, the return stack storage additionally stores a call tag and a return tag with each return address. The call tag and return tag respectively identify call and return instructions associated with the return address These tags may be compared to a branch tag conveyed to the return prediction unit upon detection of a branch misprediction. The results of the comparisons may be used to adjust the contents of the return stack storage with respect to the misprediction. The microprocessor may continue to predict return addresses correctly following a mispredicted branch instruction.
摘要:
An instruction scanning unit for a superscalar microprocessor is disclosed. The instruction scanning unit processes start, end, and functional byte information (or predecode data) associated with a plurality of contiguous instruction bytes. The processing of start byte information and end byte information is performed independently and in parallel, and the instruction scanning unit produces a plurality of scan values which identify valid instructions within the plurality of contiguous instruction bytes. Additionally, the instruction scanning unit is scaleable. Multiple instruction scanning units may be operated in parallel to process a larger plurality of contiguous instruction bytes. Furthermore, the instruction scanning unit detects error conditions in the predecode data in parallel with scanning to locate instructions. Moreover, in parallel with the error checking and scanning to locate instructions, MROM instructions are located for dispatch to an MROM unit.
摘要:
A prediction storage for branch predictions and information corresponding to branch instructions which are outstanding within an instruction processing pipeline of a microprocessor. A branch tag is assigned to each branch instruction and the corresponding branch prediction and prediction information is stored into the prediction storage. The branch tag is routed through the instruction processing pipeline with the branch instruction. Branch prediction information corresponding to the instruction remains within the branch prediction storage apparatus, which may be integrated into a branch predictor or coupled nearby. The branch tag may be more easily routed through the pipeline since the branch tag may include fewer bits than the corresponding branch prediction information. The branch prediction information may be updated after correct or incorrect prediction by conveying an indication of the prediction or misprediction and the branch tag of the branch instruction to the branch prediction storage apparatus.
摘要:
A data memory unit having a load/store unit and a data cache is provided which allows store instructions that are part of a load-op-store instruction to be executed with one access to a data cache. The load/store unit is configured with a load/store buffer having a checked bit and a way field for each buffer storage location. For load-op-store instructions, the checked bit associated with the store portion of the of the instruction is set when the load portion of the instruction accesses and hits the data cache. Also, the way field associated with the store portion is set to the way of the data cache in which the load portion hits. The data cache is configured with a locking mechanism for each cache line stored in the data cache. When the load portion of a load-op-store instruction is executed, the associated line is locked such that the line will remain in the data cache until a store instruction executes. In this way, the store portion of the load-op-store instruction is guaranteed to hit the data cache. The store may then store its data into the data cache without first performing a read cycle to determine if the store address hits the data cache.
摘要:
A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.
摘要:
A data processing device maintains register map information that maps accesses to architectural registers, as identified by instructions being executed, to physical registers of the data processing device. In response to determining that an instruction, such as a speculatively-executing conditional branch, indicates a checkpoint, the data processing device stores the register map information for subsequent retrieval depending on the resolution of the instruction. In addition, in response to the checkpoint indication the data processing device generates new register map information such that accesses to the architectural registers are mapped to different physical registers. The data processing device maintains a list, referred to as a free register list, of physical registers available to be mapped to an architectural registers.
摘要:
A method includes, in one implementation, receiving a first set of instructions of a first thread, receiving a second set of instructions of a second thread, and allocating queues to the instructions from the first and second sets. During a time when the first and second threads are simultaneously being processed, changeable number of queues to can be allocated to the first thread based on factors such as the first and/or second thread's requirements or priorities, while maintaining a minimum specified number of queues that are allocated to the first and/or second thread. When needed, one thread may be stalled so that at least the minimum number of queues remains reserved for another thread while attempting to satisfy thread-priority requests or queue-requirement requests.