摘要:
When a branch misprediction in a pipelined processor is discovered, if the mispredicted branch instruction is not the last uncommitted instruction in the pipelines, older uncommitted instructions are checked for dependency on a long latency operation. If one is discovered, all uncommitted instructions are flushed from the pipelines without waiting for the dependency to be resolved. The branch prediction is corrected, and the branch instruction and all flushed instructions older than the branch instruction are re-fetched and executed.
摘要:
Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false.
摘要:
Efficient techniques are described for enforcing order of memory accesses. A memory access request is received from a device which is not configured to generate memory barrier commands. A surrogate barrier is generated in response to the memory access request. A memory access request may be a read request. In the case of a memory write request, the surrogate barrier is generated before the write request is processed. The surrogate barrier may also be generated in response to a memory read request conditional on a preceding write request to the same address as the read request. Coherency is enforced within a hierarchical memory system as if a memory barrier command was received from the device which does not produce memory barrier commands.
摘要:
When misses occur in an instruction cache, prefetching techniques are used that minimize miss rates, memory access bandwidth, and power use. One of the prefetching techniques operates when a miss occurs. A notification that a fetch address missed in an instruction cache is received. The fetch address that caused the miss is analyzed to determine an attribute of the fetch address and based on the attribute a line of instructions is prefetched. The attribute may indicate that the fetch address is a target address of a non-sequential operation. Another attribute may indicate that the fetch address is a target address of a non-sequential operation and the target address is more than X% into a cache line. A further attribute may indicate that the fetch address is an even address in the instruction cache. Such attributes may be combined to determine whether to prefetch.
摘要:
A snoop request cache maintains records of previously issued snoop requests. Upon writing shared data, a snooping entity performs a lookup in the cache. If the lookup hits (and, in some embodiments, includes an identification of a target processor) the snooping entity suppresses the snoop request. If the lookup misses (or hits but the hitting entry lacks an identification of the target processor) the snooping entity allocates an entry in the cache (or sets an identification of the target processor) and directs a snoop request such to the target processor, to change the state of a corresponding line in the processor's L1 cache. When the processor reads shared data, it performs a snoop cache request lookup, and invalidates a hitting entry in the event of a hit (or clears it processor identification from the hitting entry), so that other snooping entities will not suppress snoop requests to it.
摘要:
A processor pipeline is segmented into an upper portion - prior to instructions going out of program order - and one or more lower portions beyond the upper portion. The upper pipeline is flushed upon detecting that a branch instruction was mispredicted, minimizing the delay in fetching of instructions from the correct branch target address. The lower pipelines may continue execution until the mispredicted branch instruction confirms, at which time all uncommitted instructions are flushed from the lower pipelines. Existing exception pipeline flushing mechanisms may be utilized, by adding a mispredicted branch identifier, reducing the complexity and hardware cost of flushing the lower pipelines.
摘要:
A method of resolving simultaneous branch predictions prior to validation of the predicted branch instruction is disclosed. The method includes processing two or more predicted branch instructions, with each predicted branch instruction having a predicted state and a corrected state. The method further includes selecting one of the corrected states. Should one of the predicted branch instructions be mispredicted, the selected corrected state is used to direct future instruction fetches.
摘要:
A processor capable of fetching and executing variable length instructions is described having instructions of at least two lengths. The processor operates in multiple modes. One of the modes restricts instructions that can be fetched and executed to the longer length instructions. An instruction cache is used for storing variable length instructions and their associated predecode bit fields in an instruction cache line and storing the instruction address and processor operating mode state information at the time of the fetch in a tag line. The processor operating mode state information indicates the program specified mode of operation of the processor. The processor fetches instructions from the instruction cache for execution. As a result of an instruction fetch operation, the instruction cache may selectively enable the writing of predecode bit fields in the instruction cache and may selectively enable the reading of predecode bit fields stored in the instruction cache based on the processor state at the time of the fetch.