Abstract:
Different software applications may use a set of instructions having critical timing paths less than a worst case critical timing path of a processor complex. For such applications, a supply voltage may be reduced while still maintaining the clock frequency necessary to meet the application's performance requirements. In order to reduce the supply voltage, an adaptive voltage scaling method is used. A critical path is selected from a plurality of critical paths for analysis on emulation logic to determine an attribute of the selected critical path during on chip functional operations. The selected critical path is representative of the worst case critical path to be in operation during a program execution. During on-chip functional operations, a voltage is controlled in response to the attribute, wherein the voltage supplies power to a power domain associated with the plurality of critical paths. The reduction in voltage reduces power drain based on instruction set usage allowing battery life to be extended.
Abstract:
A Branch Target Address Cache (BTAC) stores at least two branch target addresses in each cache line. The BTAC is indexed by a truncated branch instruction address. An offset obtained from a branch prediction offset table determines which of the branch target addresses is taken as the predicted branch target address. The offset table may be indexed in several ways, including by a branch history, by a hash of a branch history and part of the branch instruction address, by a gshare value, randomly, in a round-robin order, or other methods.
Abstract:
A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss.
Abstract:
A CAM bank is functionally divided into two or more sub-banks, without replicating CAM driver circuits, by disabling all match line discharge circuits in the bank, and selectively enabling the discharge circuits in entries comprising sub-banks. At least one selectively actuated switching circuit is interposed between the virtual ground node of each discharging comparator in the discharge circuit of a sub-bank and circuit ground. When the switching circuit is in a non-conductive state, the virtual ground node is maintained at a voltage level sufficiently above circuit ground to preclude discharging a connected match line within the CAM access time. When the switching circuit is placed in a conductive state, the virtual ground node is pulled to circuit ground and the connected match line may be discharged by a miscompare. Control signals, which may be decoded from address bits, are distributed to the switching circuits to define the CAM sub-banks.
Abstract:
A microprocessor includes two branch history tables, and is configured to use a first one of the branch history tables for predicting branch instructions that are hits in a branch target cache, and to use a second one of the branch history tables for predicting branch instructions that are misses in the branch target cache. As such, the first branch history table is configured to have an access speed matched to that of the branch target cache, so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.
Abstract:
A conditional instruction architected to receive one or more operands as inputs, to output to a target the result of an operation performed on the operands if a condition is satisfied, and to not provide an output if the condition is not satisfied, is executed so that it unconditionally provides an output to the target. The conditional instruction obtains the prior value of the target (that is, the value produced by the most recent instruction preceding the conditional instruction that updated that target). The condition is evaluated. If the condition is satisfied, an operation is performed and the result of the operation output to the target. If the condition is not satisfied, the prior value is output to the target. Subsequent instructions may rely on the target as an operand source (whether written to a register or forwarded to the instruction), prior to the condition evaluation.
Abstract:
Delays due to waiting for operands that will not be used by a select operand instruction, are alleviated based on an early recognition that such operand data is not required in order to complete the processing of the select operand instruction. At appropriate points prior to execution, determinations are made regarding a selection criterion or criteria specified by the select operand instruction, conditions that affect the selection criteria, and the availability of operands. A hold circuit uses the determinations to control the activation and release of a hold signal that controls processor pipeline stalls. A stall required to wait for operand data is skipped or a stall is terminated early, if the selected operand is available even though the other operand, that will not be used, is not available. A stall due to waiting for operands is maintained until the selection criteria is met and the selected operand is fetched and made available.
Abstract:
A fixed number of variable-length instructions are stored in each line of an instruction cache. The variable-length instructions are aligned along predetermined boundaries. Since the length of each instruction in the line, and hence the span of memory the instructions occupy, is not known, the address of the next following instruction is calculated and stored with the cache line. Ascertaining the instruction boundaries, aligning the instructions, and calculating the next fetch address are performed in a predecoder prior to placing the instructions in the cache.
Abstract:
A pipelined processor comprises an instruction cache (iCache), a branch target address cache (BTAC), and processing stages, including a stage to fetch from the iCache and the BTAC. To compensate for the number of cycles needed to fetch a branch target address from the BTAC, the fetch from the BTAC leads the fetch of a branch instruction from the iCache by an amount related to the cycles needed to fetch from the BTAC. Disclosed examples either decrement a write address of the BTAC or increment a fetch address of the BTAC, by an amount essentially corresponding to one less than the cycles needed for a BTAC fetch.
Abstract:
One or more architected registers in a processor are fractional-word writable, and data from plural misaligned memory access operations are assembled directly in an architected register, without first assembling the data in a fractional-word writable, non-architected register and then transferring it to the architected register. In embodiments where a general-purpose register file utilizes register renaming or a reorder buffer, data from plural misaligned memory access operations are assembled directly in a fractional-word writable architected register, without the need to fully exception check both misaligned memory access operations before performing the first memory access operation.