Abstract:
A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch predictor that generates branch predictions. Each branch prediction corresponds to a branch instruction and includes a prediction that the corresponding branch is to be taken or not taken. In addition, each branch prediction identifies both an end of the current branch prediction window and the start of another branch prediction window. Using these known boundaries, the processor provides different sequential fetch streams to different ones of the plurality of fetch and decode states, which concurrently process the instructions of the different fetch streams, thereby improving overall instruction throughput at the processor.
Abstract:
A processor employs a plurality of op cache pipelines to concurrently provide previously decoded operations to a dispatch stage of an instruction pipeline. In response to receiving a first branch prediction at a processor, the processor selects a first op cache pipeline of the plurality of op cache pipelines of the processor based on the first branch prediction, and provides a first set of operations associated with the first branch prediction to the dispatch queue via the selected first op cache pipeline.
Abstract:
Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.
Abstract:
The present invention provides a method and apparatus for caching pre-decode information. Some embodiments of the apparatus include a first pre-decode array configured to store pre-decode information for an instruction cache line that is resident in a first cache in response to the instruction cache line being evicted from one or more second cache(s). Some embodiments of the apparatus also include a second array configured to store a plurality of bits associated with the first cache. Subsets of the bits are configured to store pointers to the pre-decode information associated with the instruction cache line.
Abstract:
Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.