Abstract:
A branch prediction unit includes a branch prediction entry corresponding to a group of contiguous instruction bytes. The branch prediction entry stores branch predictions corresponding to branch instructions within the group of contiguous instruction bytes. Additionally, the branch prediction entry stores a set of branch selectors corresponding to the group of contiguous instruction bytes. The branch selectors identify which branch prediction is to be selected if the corresponding byte (or bytes) is selected by the offset portion of the fetch address. Still further, a predicted branch selector is stored. The predicted branch selector is used to select a branch prediction for forming the fetch address. In parallel, a selected branch selector is selected from the set of branch selectors. The predicted branch selector is verified using the selected branch selector. If the selected branch selector and the predicted branch selector mismatch, the correct branch prediction is generated and the predicted branch selector is updated to indicate the selected branch selector.
Abstract:
A microprocessor including a reorder buffer configured to store speculative register values regarding a particular register is provided. One value is stored for each set of concurrently decoded instructions which are outstanding within the microprocessor, reflecting the updates of each instruction within the set which updates the register. Additionally, the reorder buffer stores a set of constants indicative of the modification of the register by each instruction within the set of concurrently decoded instructions. Recovery from a mispredicted branch instruction (or from an instruction which causes an exception, a TRAP instruction, or an interrupt) may be achieved by utilizing the constants to adjust the result generated for the set of concurrently decoded instructions including the mispredicted branch instruction. The constants generated to indicate the modifications of the particular register may additionally allow multiple instructions having a dependency for the particular register to execute in parallel.
Abstract:
A microprocessor including a pair of caches is provided. One of the pair of caches is accessed by stack-relative memory accesses from the decode stage of the instruction processing pipeline. The second of the pair of caches is accessed by memory accesses from the execute stage of the instruction processing pipeline. When a miss is detected in the first of the pair of caches, the stack-relative memory access which misses is conveyed to the execute stage of the instruction processing pipeline. When the stack-relative memory access accesses the second of the pair of caches, the cache line containing the access is transmitted to the first of the pair of caches for storage. The first of the pair of caches selects a victim line for replacement when the data is transferred from the second of the pair of caches. If the victim line has been modified while stored in the first cache, then the victim line is stored in a copyback buffer. A signal is asserted by the first cache to inform the second cache of the need to perform a victim line copyback. Requests from the execute stage of the instruction processing pipeline are stalled to allow the copyback to occur.
Abstract:
A superscalar microprocessor is provided having functional units which receive a pointer (a reorder buffer tag) which is compared to the reorder buffer tags of the instructions currently being executed. The pointer identifies the oldest outstanding branch instruction. If a functional unit's reorder buffer tag matches the pointer, then that functional unit conveys its corrected fetch address to the instruction fetching mechanism of the superscalar microprocessor (i.e. the branch prediction unit). The superscalar microprocessor also includes a load/store unit which receives a pair of pointers identifying the oldest outstanding instructions which are not in condition for retirement. The load/store unit compares these pointers with the reorder buffer tags of load instructions that miss the data cache and store instructions. A match must be found before the associated instruction is presented to the data cache and the main memory system. The pointer-compare mechanism provides an ordering mechanism for load instructions that miss the data cache and store instructions.
Abstract:
An apparatus including a banked instruction cache and a branch prediction unit is provided. The banked instruction cache allows multiple instruction fetch addresses (comprising consecutive instruction blocks from the predicted instruction stream being executed by the microprocessor) to be fetched concurrently. The instruction cache provides an instruction block corresponding to one of the multiple fetch addresses to the instruction processing pipeline of the microprocessor during each consecutive clock cycle, while additional instruction fetch addresses from the predicted instruction stream are fetched. Preferably, the instruction cache includes at least a number of banks equal to the number of clock cycles consumed by an instruction cache access. In this manner, instructions may be provided during each consecutive clock cycle even though instruction cache access time is greater than the clock cycle time of the microprocessor. Because consecutive instruction blocks from the instruction stream are fetched concurrently, the branch prediction unit stores a prediction for a non-consecutive instruction block with each instruction block. For example, for an instruction cache having a cache access time which is twice the clock cycle time, a prediction for the second consecutive instruction block following a particular instruction block within the predicted instruction stream is stored. When a pair of consecutive instruction blocks are fetched, predictions for a second pair of consecutive instruction blocks within the instruction stream subsequent to the pair of consecutive instruction blocks are formed from the branch prediction information stored with respect to the pair of consecutive instruction blocks.
Abstract:
A memory including first storage circuits for storing first values and second storages circuit for storing second values is provided. The first value may be retired branch prediction information, while the second value may be speculative branch prediction information. The speculative branch prediction information is updated when the corresponding instructions are fetched, and the retired branch prediction value is updated when the corresponding branch instruction is retired. The speculative branch prediction information is used to form branch predictions. Therefore, the speculatively fetched and executed branches influence subsequent branch predictions. Upon detection of a mispredicted branch or an instruction which causes an exception, the speculative branch prediction information is updated to the corresponding retired branch prediction information. An update circuit is coupled between the first and second storage circuits for transmitting the updated information upon assertion of a control signal. The control signal may be asserted to cause the update of each speculative branch prediction by the corresponding retired branch prediction. The updates occur substantially simultaneously, restoring any corruption to speculative branch predictions due to speculatively fetched branch instructions which were flushed from the instruction processing pipeline. Although discussed herein in terms of a branch prediction array, the memory may be adapted to many other applications.
Abstract:
A RAM array circuit is provided which includes a memory array formed by several RAM cell columns. A particular cell within each column and row may be selected for access (either read or write) by an address decode circuit. The RAM array circuit employs a self-time column having a delay characteristic which is approximately equal to that of each of the RAM cell columns. The rising edge of a single-phase clock is used to precharge each RAM cell column as well as the self-time column. As the self-time column is precharged to a high level, the self-time control circuit disables the precharge and enables the array access for read or write. When a particular row is selected by the address decoding mechanism, the self-time column is discharged. Once the self-time column has discharged, a sense amplifier is enabled to read data from the array. Access is then disabled and precharge is again enabled upon the next rising edge of the clock.
Abstract:
A processor configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. A synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue.
Abstract:
In a processing system capable of single and multi-thread execution, a branch prediction unit can be configured to detect hard to predict branches and loop instructions. In a dual-threading (simultaneous multi-threading) configuration, one instruction queues (IQ) is used for each thread and instructions are alternately sent from each IQ to decode units. In single thread mode, the second IQ can be used to store the “not predicted path” of the hard-to-predict branch or the “fall-through” path of the loop. On mis-prediction, the mis-prediction penalty is reduced by getting the instructions from IQ instead of instruction cache.
Abstract:
In a processor, a decode unit identifies instructions needing a checkpoint and enables selected checkpoints. A register file unit includes a plurality of architectural registers. A first set of checkpoint registers correspond to a first checkpoint. Each checkpoint register corresponds to a corresponding architectural register. A first set of indicators correspond to the first set of checkpoint registers to indicate whether the corresponding architectural register has been modified or is intended to be modified prior to enabling of the first checkpoint. A second set of indicators correspond to the first set of checkpoint registers and indicate whether the corresponding architectural register has been modified or is intended to be modified after enabling the first checkpoint.