摘要:
A microprocessor employs a branch prediction unit including a branch prediction storage which stores the index portion of branch target addresses and an instruction cache which is virtually indexed and physically tagged. The branch target index (if predicted-taken), or the sequential index (if predicted not-taken) is provided as the index to the instruction cache. The selected physical tag is provided to a reverse translation lookaside buffer (TLB) which translates the physical tag to a virtual page number. Concatenating the virtual page number to the virtual index from the instruction cache (and the offset portion, generated from the branch prediction) results in the branch target address being generated. In one embodiment, a current page register stores the most recently translated virtual page number and the corresponding real page number. The branch prediction unit predicts that each fetch address will continue to reside in the current page and uses the virtual page number from the current page to form the branch Target address. The physical tag from the fetched cache line is compared to the corresponding real page number to verify that the fetch address is actually still within the current page. When a mismatch is detected between the corresponding real page number and the physical tag from the fetched cache line, the branch target address is corrected with the linear page number provided by the reverse TLB and the current page register is updated.
摘要:
A microprocessor is provided which is configured to predict return addresses for return instructions according to a return stack storage included therein. The return stack storage is a stack structure configured to store return addresses associated with previously detected call instructions. Return addresses may be predicted for return instructions early in the instruction processing pipeline of the microprocessor. In one embodiment, the return stack storage additionally stores a call tag and a return tag with each return address. The call tag and return tag respectively identify call and return instructions associated with the return address. These tags may be compared to a branch tag conveyed to the return prediction unit upon detection of a branch misprediction. The results of the comparisons may be used to adjust the contents of the return stack storage with respect to the misprediction. The microprocessor may continue to predict return addresses correctly following a mispredicted branch instruction.
摘要:
A branch prediction unit includes a branch prediction entry corresponding to a group of contiguous instruction bytes. The branch prediction entry stores branch predictions corresponding to branch instructions within the group of contiguous instruction bytes. Additionally, the branch prediction entry stores a set of branch selectors corresponding to the group of contiguous instruction bytes. The branch selectors identify which branch prediction is to be selected if the corresponding byte (or bytes) is selected by the offset portion of the fetch address. Still further, a predicted branch selector is stored. The predicted branch selector is used to select a branch prediction for forming the fetch address. In parallel, a selected branch selector is selected from the set of branch selectors. The predicted branch selector is verified using the selected branch selector. If the selected branch selector and the predicted branch selector mismatch, the correct branch prediction is generated and the predicted branch selector is updated to indicate the selected branch selector.
摘要:
A microprocessor including a pair of caches is provided. One of the pair of caches is accessed by stack-relative memory accesses from the decode stage of the instruction processing pipeline. The second of the pair of caches is accessed by memory accesses from the execute stage of the instruction processing pipeline. When a miss is detected in the first of the pair of caches, the stack-relative memory access which misses is conveyed to the execute stage of the instruction processing pipeline. When the stack-relative memory access accesses the second of the pair of caches, the cache line containing the access is transmitted to the first of the pair of caches for storage. The first of the pair of caches selects a victim line for replacement when the data is transferred from the second of the pair of caches. If the victim line has been modified while stored in the first cache, then the victim line is stored in a copyback buffer. A signal is asserted by the first cache to inform the second cache of the need to perform a victim line copyback. Requests from the execute stage of the instruction processing pipeline are stalled to allow the copyback to occur.
摘要:
A superscalar microprocessor is provided having functional units which receive a pointer (a reorder buffer tag) which is compared to the reorder buffer tags of the instructions currently being executed. The pointer identifies the oldest outstanding branch instruction. If a functional unit's reorder buffer tag matches the pointer, then that functional unit conveys its corrected fetch address to the instruction fetching mechanism of the superscalar microprocessor (i.e. the branch prediction unit). The superscalar microprocessor also includes a load/store unit which receives a pair of pointers identifying the oldest outstanding instructions which are not in condition for retirement. The load/store unit compares these pointers with the reorder buffer tags of load instructions that miss the data cache and store instructions. A match must be found before the associated instruction is presented to the data cache and the main memory system. The pointer-compare mechanism provides an ordering mechanism for load instructions that miss the data cache and store instructions.
摘要:
An apparatus including a banked instruction cache and a branch prediction unit is provided. The banked instruction cache allows multiple instruction fetch addresses (comprising consecutive instruction blocks from the predicted instruction stream being executed by the microprocessor) to be fetched concurrently. The instruction cache provides an instruction block corresponding to one of the multiple fetch addresses to the instruction processing pipeline of the microprocessor during each consecutive clock cycle, while additional instruction fetch addresses from the predicted instruction stream are fetched. Preferably, the instruction cache includes at least a number of banks equal to the number of clock cycles consumed by an instruction cache access. In this manner, instructions may be provided during each consecutive clock cycle even though instruction cache access time is greater than the clock cycle time of the microprocessor. Because consecutive instruction blocks from the instruction stream are fetched concurrently, the branch prediction unit stores a prediction for a non-consecutive instruction block with each instruction block. For example, for an instruction cache having a cache access time which is twice the clock cycle time, a prediction for the second consecutive instruction block following a particular instruction block within the predicted instruction stream is stored. When a pair of consecutive instruction blocks are fetched, predictions for a second pair of consecutive instruction blocks within the instruction stream subsequent to the pair of consecutive instruction blocks are formed from the branch prediction information stored with respect to the pair of consecutive instruction blocks.
摘要:
A memory including first storage circuits for storing first values and second storages circuit for storing second values is provided. The first value may be retired branch prediction information, while the second value may be speculative branch prediction information. The speculative branch prediction information is updated when the corresponding instructions are fetched, and the retired branch prediction value is updated when the corresponding branch instruction is retired. The speculative branch prediction information is used to form branch predictions. Therefore, the speculatively fetched and executed branches influence subsequent branch predictions. Upon detection of a mispredicted branch or an instruction which causes an exception, the speculative branch prediction information is updated to the corresponding retired branch prediction information. An update circuit is coupled between the first and second storage circuits for transmitting the updated information upon assertion of a control signal. The control signal may be asserted to cause the update of each speculative branch prediction by the corresponding retired branch prediction. The updates occur substantially simultaneously, restoring any corruption to speculative branch predictions due to speculatively fetched branch instructions which were flushed from the instruction processing pipeline. Although discussed herein in terms of a branch prediction array, the memory may be adapted to many other applications.
摘要:
A RAM array circuit is provided which includes a memory array formed by several RAM cell columns. A particular cell within each column and row may be selected for access (either read or write) by an address decode circuit. The RAM array circuit employs a self-time column having a delay characteristic which is approximately equal to that of each of the RAM cell columns. The rising edge of a single-phase clock is used to precharge each RAM cell column as well as the self-time column. As the self-time column is precharged to a high level, the self-time control circuit disables the precharge and enables the array access for read or write. When a particular row is selected by the address decoding mechanism, the self-time column is discharged. Once the self-time column has discharged, a sense amplifier is enabled to read data from the array. Access is then disabled and precharge is again enabled upon the next rising edge of the clock.
摘要:
A processor configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. A synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue.
摘要:
In a processing system capable of single and multi-thread execution, a branch prediction unit can be configured to detect hard to predict branches and loop instructions. In a dual-threading (simultaneous multi-threading) configuration, one instruction queues (IQ) is used for each thread and instructions are alternately sent from each IQ to decode units. In single thread mode, the second IQ can be used to store the “not predicted path” of the hard-to-predict branch or the “fall-through” path of the loop. On mis-prediction, the mis-prediction penalty is reduced by getting the instructions from IQ instead of instruction cache.