摘要:
A microprocessor includes two branch history tables, and is configured to use a first one of the branch history tables for predicting branch instructions that are hits in a branch target cache, and to use a second one of the branch history tables for predicting branch instructions that are misses in the branch target cache. As such, the first branch history table is configured to have an access speed matched to that of the branch target cache, so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.
摘要:
Automatic calibration circuits for operational calibration of critical-path time delays in adaptive clock distribution systems, and related methods and systems, are disclosed. The adaptive clock distribution system includes a tunable-length delay circuit to delay distribution of a clock signal provided to a clocked circuit, to prevent timing margin degradation of the clocked circuit after a voltage droop occurs in a power supply supplying power to the clocked circuit. The adaptive clock distribution system also includes a dynamic variation monitor to reduce frequency of the delayed clock signal provided to the clocked circuit in response to the voltage droop in the power supply, so that the clocked circuit is not clocked beyond its performance limits during a voltage droop. An automatic calibration circuit is provided in the adaptive clock distribution system to calibrate the dynamic variation monitor during operation based on operational conditions and environmental conditions of the clocked circuit.
摘要:
A processor includes a return stack circuit used for predicting procedure return addresses for instruction pre-fetching, wherein a return stack controller determines the number of return levels associated with a given return instruction, and pops that number of return addresses from the return stack. Popping multiple return addresses from the return stack permits the processor to pre-fetch the return address of the original calling procedure in a chain of successive procedure calls. In one embodiment, the return stack controller reads the number of return levels from a value embedded in the return instruction. A complementary compiler calculates the return level values for given return instructions and embeds those values in them at compile-time. In another embodiment, the return stack circuit dynamically tracks the number of return levels by counting the procedure calls (branches) in a chain of successive procedure calls.
摘要:
A register file is disclosed. The register file includes a plurality of registers and a decoder. The decoder may be configured to receive an address for any one of the registers, and disable a read operation to the addressed register if data in the addressed register is invalid.
摘要:
Data from a source domain (311 ) operating at a first data rate is transferred to a FIFO (319) in another domain (313) operating at a different data rate. The FIFO (319) buffers data before transfer to a sink for further processing or storage. A source side counter (325) tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter (325) decrements in response to a data ready signal from the source domain (311)1 without delay. The counter (325) increments in response to signaling from the sink domain (313) of a read of data off the FIFO (319). Hence, incrementing is subject to the signaling latency between domains. The source (315) may send one more beat of data when the counter (325) indicates the FIFO (319) is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one o more FIFO positions.
摘要:
A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.
摘要:
Delays due to waiting for operands that will not be used by a select operand instruction, are alleviated based on an early recognition that such operand data is not required in order to complete the processing of the select operand instruction. At appropriate points prior to execution, determinations are made regarding a selection criterion or criteria specified by the select operand instruction, conditions that affect the selection criteria, and the availability of operands. A hold circuit uses the determinations to control the activation and release of a hold signal that controls processor pipeline stalls. A stall required to wait for operand data is skipped or a stall is terminated early, if the selected operand is available even though the other operand, that will not be used, is not available. A stall due to waiting for operands is maintained until the selection criteria is met and the selected operand is fetched and made available.
摘要:
A microprocessor (10) includes two branch history tables, and is configured to use a first one (48) of the branch history tables for predicting branch instructions that are hits in a branch target cache (46), and to use a second one (50) of the branch history tables for predicting branch instructions that are misses in the branch target cache (46). As such, the first branch history table (48) is configured to have an access speed matched to that of the branch target cache (46), so that its prediction information is timely available relative to branch target cache hit detection, which may happen early in the microprocessor's instruction pipeline. The second branch history table (50) thus need only be as fast as is required for providing timely prediction information in association with recognizing branch target cache misses as branch instructions, such as at the instruction decode stage(s) of the instruction pipeline.
摘要:
A Branch Target Address Cache (BTAC) stores at least two branch target addresses in each cache line. The BTAC is indexed by a truncated branch instruction address. An offset obtained from a branch prediction offset table determines which of the branch target addresses is taken as the predicted branch target address. The offset table may be indexed in several ways, including by a branch history, by a hash of a branch history and part of the branch instruction address, by a gshare value, randomly, in a round-robin order, or other methods.
摘要:
One or more architected registers in a processor are fractional-word writable, and data from plural misaligned memory access operations are assembled directly in an architected register, without first assembling the data in a fractional-word writable, non-architected register and then transferring it to the architected register. In embodiments where a general-purpose register file utilizes register renaming or a reorder buffer, data from plural misaligned memory access operations are assembled directly in a fractional-word writable architected register, without the need to fully exception check both misaligned memory access operations before performing the first memory access operation.