Abstract:
A prefetch/predecode unit includes one or more prefetch buffers which are configured to store prefetched sets of instruction bytes and corresponding predecode data. Additionally, each prefetch buffer is configured to store a predecode byte pointer. The predecode byte pointer indicates the byte within the corresponding prefetched set of instruction bytes at which predecoding is to be initiated. Predecoding may be resumed within a given prefetch buffer (at the byte indicated by the predecode byte pointer) if predecoding thereof is interrupted to predecode a different set of instruction bytes (e.g. a set of instruction bytes fetched from the instruction cache).
Abstract:
A floating point unit capable of executing multiple instructions in a single clock cycle using a central window and a register map is disclosed. The floating point unit comprises: a plurality of translation units, a future file, a central window, a plurality of functional units, a result queue, and a plurality of physical registers. The floating point unit receives speculative instructions, decodes them, and then stores them in the central window. Speculative top of stack values are generated for each instruction during decoding. Top of stack relative operands are computed to physical registers using a register map. Register stack exchange operations are performed during decoding. Instructions are then stored in the central window, which selects the oldest stored instructions to be issued to each functional pipeline and issues them. Conversion units convert the instruction's operands to an internal format, and normalization units detect and normalize any denormal operands. Finally, the functional pipelines execute the instructions.
Abstract:
An instruction fetch unit that employs sequential way prediction. The instruction fetch unit comprises a control unit configured to convey a first index and a first way to an instruction cache in a first clock cycle. The first index and first way select a first group of contiguous instruction bytes within the instruction cache, as well as a corresponding branch prediction block. The branch prediction block is stored in a branch prediction storage, and includes a predicted sequential way value. The control unit is further configured to convey a second index and a second way to the instruction cache in a second clock cycle succeeding the first clock cycle. This second index and second way select a second group of contiguous instruction bytes from the instruction cache. The second way is selected to be the predicted sequential way value stored in the branch prediction block corresponding to the first group of contiguous instruction bytes in response to a branch prediction algorithm employed by the control unit predicting a sequential execution path. Advantageously, a set associative instruction cache utilizing this method of way prediction may operate at higher frequencies (i.e., lower clock cycles) than if tag comparison were used to select the correct way.
Abstract:
A decoded instruction cache which stores both directly executable and microcode instructions for concurrent dispatch to a plurality of issue positions. An instruction address required by a superscalar microprocessor is first presented to the decoded instruction cache. If the address is not present in the decoded instruction cache, the instruction bytes are retrieved either from an instruction cache or main memory. In either case, a group of instruction bytes are conveyed to an early decode unit, which performs partial decoding on the instructions therein. These partially decoded instructions are conveyed to the decoded instruction cache for storage. If the first instruction conveyed from the group of instruction bytes is a directly executable instruction, the partially decoded information corresponding to the first instruction is stored in a cache line selected according to the opcode of the first instruction. Directly executable instructions subsequent to the first instruction in the group of instruction bytes may be stored in succeeding locations in the same cache line. If the first instruction is a microcode instruction, operand information provided by the early decode unit is stored to one or more cache lines including directly executable instructions which, when executed, effectuate the operation of that microcode instruction. When a read is performed on a valid line in the decoded instruction cache, partially decoded instructions already aligned for dispatch are conveyed to a plurality of issue positions.
Abstract:
An instruction scanning unit for a superscalar microprocessor is disclosed. The instruction scanning unit processes start, end, and functional byte information (or predecode data) associated with a plurality of contiguous instruction bytes. The processing of start byte information and end byte information is performed independently and in parallel, and the instruction scanning unit produces a plurality of scan values which identify valid instructions within the plurality of contiguous instruction bytes. Additionally, the instruction scanning unit is scaleable. Multiple instruction scanning units may be operated in parallel to process a larger plurality of contiguous instruction bytes. Furthermore, the instruction scanning unit detects error conditions in the predecode data in parallel with scanning to locate instructions. Moreover, in parallel with the error checking and scanning to locate instructions, MROM instructions are located for dispatch to an MROM unit.
Abstract:
A branch prediction apparatus is provided which stores multiple branch selectors corresponding to instruction bytes within a cache line of instructions or portion thereof. The branch selectors identify a branch prediction to be selected if the corresponding instruction byte is the byte indicated by the offset of the fetch address used to fetch the cache line. Instead of comparing pointers to the branch instructions with the offset of the fetch address, the branch prediction is selected simply by decoding the offset of the fetch address and choosing the corresponding branch selector. The branch prediction apparatus may operate at a higher frequencies (i.e. lower clock cycles) than if the pointers to the branch instruction and the fetch address were compared (a greater than or less than comparison). The branch selectors directly determine which branch prediction is appropriate according to the instructions being fetched, thereby decreasing the amount of logic employed to select the branch prediction.
Abstract:
A microprocessor employing a reorder buffer is configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. In one embodiment, the reorder buffer allocates a line of storage sufficient to store instruction results corresponding to a maximum number of concurrently dispatchable instructions regardless of the number actually dispatched. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases.
Abstract:
A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte include an end bit and two ROP bits. The ROP bits indicate a number of microinstructions required to implement the instruction. The plurality of predecode bits are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to identify microinstructions. The instruction alignment unit dispatches the microinstructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. Because the instruction alignment unit identifies microinstructions, the multiplexing of instructions from the instruction alignment unit to the decoders is simplified. Accordingly, relatively fast multiplexing may be attained, and high performance may be accommodated.
Abstract:
A superscalar microprocessor is provided having a load/store unit which receives a pair of pointers identifying the oldest outstanding instructions which are not in condition for retirement. The load/store unit compares these pointers with the reorder buffer tags of load instructions that miss the data cache and store instructions. A match must be found before the associated instruction accesses the data cache and the main memory system. The pointer-compare mechanism provides an ordering mechanism for load instructions that miss the data cache and store instructions.
Abstract:
A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instructions results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. The line of storage remains allocated until each instruction within the line is ready to retire, and then the line is deallocated as the one or more instructions are concurrently retired.