Abstract:
Replaying speculatively dispatched load-dependent instructions in response to a cache miss for a producing load instruction in an out-of-order processor (OoP) is disclosed. To allow for a scheduler circuit to restore register dependencies in a register dependency tracking circuit for a replay operation in response to a cache miss for execution of a load instruction, the scheduler circuit includes a replay circuit. The replay circuit includes a load dependency tracking circuit. The replay circuit is configured to track dependencies of dispatched load instructions in the load dependency tracking circuit. The replay circuit uses these tracked dependencies to restore register dependencies for the dispatched load instructions in the register dependency tracking circuit in response to a replay operation. Thus, the load instruction does not have to be re-allocated to restore register dependencies in the register dependency tracking circuit used for re-dispatching load-dependent instructions.
Abstract:
Providing references to previously decoded instructions of recently-provided instructions to be executed by a processor is disclosed herein. In one aspect, a low resource micro-operation controller is provided. Responsive to an instruction pipeline receiving an instruction address, the low resource micro-operation controller is configured to determine if the received instruction address corresponds to an instruction address in short history table. Short history table includes instruction addresses of recently-provided instructions having micro-ops in a post-decode queue. If the received instruction address corresponds to an instruction address in short history table, the low resource micro-operation controller is configured to provide reference (e.g., pointer) to the fetch stage that corresponds to an entry in the post-decode queue in which the micro-ops corresponding to the instruction address are stored. Responsive to the decode stage receiving the reference, the low resource micro-operation controller is configured to provide the micro-ops from the post-decode queue for execution.
Abstract:
Selective storing of previously decoded instructions of frequently-called instruction sequences in an instruction sequence buffer to be executed by a processor is disclosed. In one aspect, a selective instruction sequence buffer controller is configured to selectively store previously decoded instructions for an instruction sequence by determining if a received instruction address corresponds to an instruction sequence captured in an instruction sequence buffer. If the received instruction address corresponds to a captured instruction sequence, the selective instruction sequence buffer controller provides corresponding micro-operations stored in the instruction sequence buffer for execution. If the received instruction address does not correspond to the captured instruction sequence, the selective instruction sequence buffer controller reduces a frequency indicator of the instruction sequence. The selective instruction sequence buffer controller may also increase the frequency indicator of the instruction sequence when the instruction sequence is accessed, capturing the instruction sequence once the frequency indicator meets a threshold.
Abstract:
Speculative history forwarding in overriding branch predictors, and related circuits, methods, and computer-readable media are disclosed. In one embodiment, a branch prediction circuit including a first branch predictor and a second branch predictor is provided. The first branch predictor generates a first branch prediction for a conditional branch instruction, and the first branch prediction is stored in a first branch prediction history. The first branch prediction is also speculatively forwarded to a second branch prediction history. The second branch predictor subsequently generates a second branch prediction based on the second branch prediction history, including the speculatively forwarded first branch prediction. By enabling the second branch predictor to base its branch prediction on the speculatively forwarded first branch prediction, an accuracy of the second branch predictor may be improved.
Abstract:
The disclosure generally relates to dynamic clock and voltage scaling (DCVS) based on program phase. For example, during each program phase, a first hardware counter may count each cycle where a dispatch stall occurs and an oldest instruction in a load queue is a last-level cache miss, a second hardware counter may count total cycles, and a third hardware counter may count committed instructions. Accordingly, a software/firmware mechanism may read the various hardware counters once the committed instruction counter reaches a threshold value and divide a value of the first hardware counter by a value of the second hardware counter to measure a stall fraction during a current program execution phase. The measured stall fraction can then be used to predict a stall fraction in a next program execution phase such that optimal voltage and frequency settings can be applied in the next phase based on the predicted stall fraction.
Abstract:
Systems and methods for operating a processor include determining confidence levels, such as high, low, and medium confidence levels, associated with in-flight branch instructions in an instruction pipeline of the processor, based on counters used for predicting directions of the in-flight branch instructions. Numbers of in-flight branch instructions associated with each of confidence levels are determined. A weighted sum of the numbers weighted with weights corresponding to the confidence levels is calculated and the weighted sum is compared with a threshold. A throttling signal may be asserted to indicate that instructions are to be throttled in a pipeline stage of the instruction pipeline based on the comparison.
Abstract:
A scheduler with a picker block capable of dispatching multiple instructions per cycle is disclosed. The picker block may comprise an inter-group picker and an intra-group picker. The inter-group picker may be configured to pick multiple ready groups when there are two or more ready groups among a plurality of groups of instructions, and pick a single ready group when the single ready group is the only ready group among the plurality of groups. The intra-group picker may be configured to pick one ready instruction from each of the multiple ready groups when the inter-group picker picks the multiple ready groups, and to pick multiple ready instructions from the single ready group when the inter-group picker picks the single ready group.
Abstract:
Selective storing of previously decoded instructions of frequently-called instruction sequences in an instruction sequence buffer to be executed by a processor is disclosed. In one aspect, a selective instruction sequence buffer controller is configured to selectively store previously decoded instructions for an instruction sequence by determining if a received instruction address corresponds to an instruction sequence captured in an instruction sequence buffer. If the received instruction address corresponds to a captured instruction sequence, the selective instruction sequence buffer controller provides corresponding micro-operations stored in the instruction sequence buffer for execution. If the received instruction address does not correspond to the captured instruction sequence, the selective instruction sequence buffer controller reduces a frequency indicator of the instruction sequence. The selective instruction sequence buffer controller may also increase the frequency indicator of the instruction sequence when the instruction sequence is accessed, capturing the instruction sequence once the frequency indicator meets a threshold.
Abstract:
Storing narrow produced values for instruction operands directly in a register map in an out-of-order processor (OoP) is provided. An OoP is provided that includes an instruction processing system. The instruction processing system includes a number of instruction processing stages configured to pipeline the processing and execution of instructions according to a dataflow execution. The instruction processing system also includes a register map table (RMT) configured to store address pointers mapping logical registers to physical registers in a physical register file (PRF) for storing produced data for use by consumer instructions without overwriting logical registers for later executed, out-of-order instructions. In certain aspects, the instruction processing system is configured to write back (i.e., store) narrow values produced by executed instructions directly into the RMT, as opposed to writing the narrow produced values into the PRF in a write back stage.
Abstract:
Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.