摘要:
A microprocessor employing a reorder buffer is configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. In one embodiment, the reorder buffer allocates a line of storage sufficient to store instruction results corresponding to a maximum number of concurrently dispatchable instructions regardless of the number actually dispatched. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases.
摘要:
A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte include an end bit and two ROP bits. The ROP bits indicate a number of microinstructions required to implement the instruction. The plurality of predecode bits are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to identify microinstructions. The instruction alignment unit dispatches the microinstructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. Because the instruction alignment unit identifies microinstructions, the multiplexing of instructions from the instruction alignment unit to the decoders is simplified. Accordingly, relatively fast multiplexing may be attained, and high performance may be accommodated.
摘要:
A superscalar microprocessor is provided having a load/store unit which receives a pair of pointers identifying the oldest outstanding instructions which are not in condition for retirement. The load/store unit compares these pointers with the reorder buffer tags of load instructions that miss the data cache and store instructions. A match must be found before the associated instruction accesses the data cache and the main memory system. The pointer-compare mechanism provides an ordering mechanism for load instructions that miss the data cache and store instructions.
摘要:
A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instructions results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. The line of storage remains allocated until each instruction within the line is ready to retire, and then the line is deallocated as the one or more instructions are concurrently retired.
摘要:
An instruction alignment unit includes a byte queue configured to store instruction blocks. Each instruction block includes a fixed number of instruction bytes and identifies up to a maximum number of instructions within the fixed number of instruction bytes. Additionally, the instruction alignment unit is configured to form a pair of instruction lists: a dispatch list and a latch list. The dispatch list includes instruction locators corresponding to instructions within the instruction blocks stored in the byte queue. Additionally, the first three instructions from instructions blocks being received from the instruction cache during a particular clock cycle are appended to the dispatch list. The dispatch list is used to select instructions from the byte queue for dispatch to the decode units. The latch list is used for receiving instruction locators for the remaining instructions from the instruction blocks received from the instruction cache during the particular clock cycle. Furthermore, the latch list receives instruction locators from the dispatch list which correspond to instructions not selected for dispatch to the decode units. The latch list is stored until a succeeding clock cycle, in which the stored program-ordered list is used as a basis for forming the dispatch list during that succeeding clock cycle. The instruction identification information and instruction bytes corresponding to the instruction can be located by selecting the instructions corresponding to the instruction locators at the front of the dispatch list.
摘要:
An apparatus including address generation units, corresponding reservation stations, and a speculative register file is provided. Decode units provide memory operation information to the corresponding reservation stations while the associated instructions are being decoded. The speculative register file stores speculative register values corresponding to previously decoded instructions. The speculative register values are generated prior to execution of the previously decoded instructions. If the register operands included in the address operands of an instruction are stored in the speculative register file, then the memory operation may be passed through the corresponding reservation station to an address generation unit. The address generation unit generates the data address from the address operands and accesses a data cache while register operands corresponding to the instruction are requested from a register file and reorder buffer.
摘要:
A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to dispatch the variable byte-length instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. With the information conveyed by the functional bits, the decode units can detect the exact locations of the opcode, displacement, immediate, register, and scale-index bytes. Accordingly, no serial scan by the decode units through the instruction bytes is needed. In addition, the functional bits allow the decode units to calculate linear addresses (via adder circuits) expeditiously for use by other subunits within the superscalar microprocessor. Accordingly, relatively fast decoding may be attained, and high performance may be accommodated.
摘要:
A superscalar microprocesor is provided that includes a predecode unit adapted for predecoding variable byte-length instructions. The predecode unit predecodes the instructions prior to their storage within an instruction cache. In one system, a predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to dispatch the variable byte-length instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. By utilizing the predecode information from the predecode unit, the instruction alignment unit may be implemented with a relatively small number of cascaded levels of logic gates, thus accommodating very high frequencies of operation. Instruction alignment to decode units may further be accomplished with relatively few pipeline stages. Finally, since the predecode unit is configured such that the meaning of the functional bit of a particular predecode tag is dependent upon the status of the start bit, a relatively large amount of predecode information may be conveyed with a relatively small number of predecode bits. This thereby allows a reduction in the size of the instruction cache without compromising performance.
摘要:
A dependency checking structure is provided which compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes. If a read memory access performed by the decode stage is dependent upon a write memory access performed by the execution stage, then the instruction associated with the read memory access and subsequent instructions are flushed. Data coherency is maintained between the pair of caches while allowing stack-relative accesses to be performed from the decode stage. The comparator circuits used to perform the comparison are configured to compare a field of address bits instead of the entire address, reducing the size while still maintaining accurate dependency checking by qualifying the resulting comparison signals with an indication that both addresses hit in the same storage location within the stack cache.
摘要:
An apparatus for aligning variable byte length instructions to a plurality of issue positions is provided. The apparatus includes a byte queue divided into several subqueues. Each subqueue is maintained such that a first instruction in program order within the subqueue is identified by information stored in a first position within the subqueue, a second instruction in program order within the subqueue is identified by information stored in a second position within the subqueue, etc. When instructions from a subqueue are dispatched, remaining instructions within the subqueue are shifted such that the first of the remaining instructions (in program order) occupies the first position, etc. Instructions are shifted from subqueue to subqueue when each of the instructions within a particular subqueue have been dispatched. The information stored in one subqueue is shifted as a unit to another subqueue independent of the internal shifting of subqueue information. The subqueues are additionally configured to handle instructions which overflow from a first subqueue into a second subqueue. Information pertaining to the overflowing instructions is maintained in the last position within the first subqueue. The information is not shifted when other positions within the subqueue are shifted. In this manner, information regarding an overflowing instruction is again located in a limited number of positions.