Abstract:
Providing late physical register allocation and early physical register release in out-of-order processor (OOP)-based devices implementing a checkpoint-based architecture is provided. In this regard, an OOP-based device provides a register management circuit that is configured to employ a combination of the checkpoint approach and the virtual register approach. The register management circuit includes a most recent table (MRT) for tracking mappings of logical register numbers (LRNs) to physical register numbers (PRNs), a physical register file (PRF) storing information for physical registers, a virtual register file (VRF) storing data for virtual registers, and a checkpoint queue for tracking active checkpoints (each of which is a snapshot of the MRT at a given time). The register management circuit applies checkpoint selection criteria for balancing the number of checkpoints, and implements late physical register allocation using virtual registers to provide an effectively larger physical register file and checkpoint-based early release of physical registers.
Abstract:
In certain aspects of the disclosure, an apparatus comprises a first scheduling pool associated with a first minimum scheduling latency and a second scheduling pool associated with a second minimum scheduling latency, the second minimum scheduling latency greater than the first minimum scheduling latency. A common instruction picker is coupled to both the first scheduling pool and the second scheduling pool. The common instruction picker may be configured to select a first instruction from the first scheduling pool and a second instruction from the second scheduling pool, and then choose either the first instruction or second instruction for dispatch according to a picking policy.
Abstract:
The disclosure generally relates to dynamic clock and voltage scaling (DCVS) based on program phase. For example, during each program phase, a first hardware counter may count each cycle where a dispatch stall occurs and an oldest instruction in a load queue is a last-level cache miss, a second hardware counter may count total cycles, and a third hardware counter may count committed instructions. Accordingly, a software/firmware mechanism may read the various hardware counters once the committed instruction counter reaches a threshold value and divide a value of first hardware counter by a value of second hardware counter to measure a stall fraction during a current program execution phase. The measured stall fraction can then be used to predict a stall fraction in a next program execution phase such that optimal voltage and frequency settings can be applied in the next phase based on the predicted stall fraction.
Abstract:
Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer.
Abstract:
A processor pipeline is segmented into an upper portion - prior to instructions going out of program order - and one or more lower portions beyond the upper portion. The upper pipeline is flushed upon detecting that a branch instruction was mispredicted, minimizing the delay in fetching of instructions from the correct branch target address. The lower pipelines may continue execution until the mispredicted branch instruction confirms, at which time all uncommitted instructions are flushed from the lower pipelines. Existing exception pipeline flushing mechanisms may be utilized, by adding a mispredicted branch identifier, reducing the complexity and hardware cost of flushing the lower pipelines.
Abstract:
A pre-decoder in a variable instruction length processor indicates properties of instructions in pre-decode bits stored in an instruction cache with the instructions. When all the encodings of pre-decode bits associate with one length instruction are defined, a property of an instruction of that length may be indicated by altering the instruction to emulate an instruction of a different length, and encoding the property in the pre-decode bits associated with instructions of the different length. One example of a property that may be so indicated is an undefined instruction.
Abstract:
Systems and methods relate to a hierarchical register file system including a level 1 physical register file (LI PRF) and a backing physical register file (PRF). A subset of ouptuts of instructions executed in an instruction pipeline of a processor which are deemed to have a high likelihood of use for one or more future instructions are identified. The subset of instruction outputs are stored in the LI PRF, while all instructon outputs are stored in the backing PRF.
Abstract:
Physical register scrubbing in computer microprocessors. Most instructions in a computer program produce some output value that is destined for one or more architected registers. These architected destination registers are renamed, in the processor pipeline, to physical registers in order to improve performance by exposing more instruction level parallelism to the processor. In one aspect, a method comprises identifying, in a reorder buffer, a first instruction and a second instruction, without intervening potential pipeline flushers, that write to the same architected destination register, in order to free the physical register corresponding to the older of the two instructions.
Abstract:
In a processor executing instructions in at least a first instruction set execution mode having a first minimum instruction length and a second instruction set execution mode having a smaller, second minimum instruction length, line and counter index addresses are formed that access every counter in a branch history table (BHT), and reduce the number of index address bits that are multiplexed based on the current instruction set execution mode. In one embodiment, counters within a BHT line are arranged and indexed in such a manner that half of the BHT can be powered down for each access in one instruction set execution mode.
Abstract:
A method of resolving simultaneous branch predictions prior to validation of the predicted branch instruction is disclosed. The method includes processing two or more predicted branch instructions, with each predicted branch instruction having a predicted state and a corrected state. The method further includes selecting one of the corrected states. Should one of the predicted branch instructions be mispredicted, the selected corrected state is used to direct future instruction fetches.