Abstract:
A predecode unit within a microprocessor predecodes a cache line of instruction bytes for storage within the instruction cache of the microprocessor. The predecode unit produces multiple shift amounts, each of which identify the beginning of a particular instruction within the instruction cache line. The shift amounts are stored in the instruction cache with the instruction bytes, and are conveyed when the instruction bytes are fetched for execution by the microprocessor. An instruction alignment unit decodes the shift amounts to locate instructions within the fetched instruction bytes. Each shift amount directly identifies a corresponding instruction for dispatch, and therefore decoding the shift amount directly results in controls for shifting the instruction bytes such that the identified instruction is conveyed to a corresponding issue position. The number of shift amounts stored may be equal to the number of issue positions within the microprocessor. The instruction alignment unit scans the start and end byte predecode data (which is also provided by the predecode unit and stored in the instruction cache) to detect any additional instructions within the cache line (e.g. instructions not identified by the shift amounts). Additional shift amounts are generated and used by the instruction alignment unit to dispatch instructions during subsequent clock cycles.
Abstract:
An instruction alignment unit includes a byte queue configured to store instruction blocks. Each instruction block includes a fixed number of instruction bytes and identifies up to a maximum number of instructions within the fixed number of instruction bytes. Additionally, the instruction alignment unit is configured to form a pair of instruction lists: a dispatch list and a latch list. The dispatch list includes instruction locators corresponding to instructions within the instruction blocks stored in the byte queue. Additionally, the first three instructions from instructions blocks being received from the instruction cache during a particular clock cycle are appended to the dispatch list. The dispatch list is used to select instructions from the byte queue for dispatch to the decode units. The latch list is used for receiving instruction locators for the remaining instructions from the instruction blocks received from the instruction cache during the particular clock cycle. Furthermore, the latch list receives instruction locators from the dispatch list which correspond to instructions not selected for dispatch to the decode units. The latch list is stored until a succeeding clock cycle, in which the stored program-ordered list is used as a basis for forming the dispatch list during that succeeding clock cycle. The instruction identification information and instruction bytes corresponding to the instruction can be located by selecting the instructions corresponding to the instruction locators at the front of the dispatch list.
Abstract:
A way prediction unit for a superscalar microprocessor is provided which predicts the next fetch address as well as the way of the instruction cache that the current fetch address hits in while the instructions associated with the current fetch are being read from the instruction cache. The way prediction unit is intended for high frequency microprocessors in which associative caches tend to be clock cycle limiting, causing the instruction fetch mechanism to require more than one clock cycle between fetch requests. Therefore, an instruction fetch can be made every clock cycle using the predicted fetch address until an incorrect next fetch address or an incorrect way is predicted. The instructions from the predicted way are provided to the instruction processing pipelines of the superscalar microprocessor each clock cycle.
Abstract:
An apparatus including address generation units, corresponding reservation stations, and a speculative register file is provided. Decode units provide memory operation information to the corresponding reservation stations while the associated instructions are being decoded. The speculative register file stores speculative register values corresponding to previously decoded instructions. The speculative register values are generated prior to execution of the previously decoded instructions. If the register operands included in the address operands of an instruction are stored in the speculative register file, then the memory operation may be passed through the corresponding reservation station to an address generation unit. The address generation unit generates the data address from the address operands and accesses a data cache while register operands corresponding to the instruction are requested from a register file and reorder buffer.
Abstract:
A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to dispatch the variable byte-length instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. With the information conveyed by the functional bits, the decode units can detect the exact locations of the opcode, displacement, immediate, register, and scale-index bytes. Accordingly, no serial scan by the decode units through the instruction bytes is needed. In addition, the functional bits allow the decode units to calculate linear addresses (via adder circuits) expeditiously for use by other subunits within the superscalar microprocessor. Accordingly, relatively fast decoding may be attained, and high performance may be accommodated.
Abstract:
A superscalar microprocesor is provided that includes a predecode unit adapted for predecoding variable byte-length instructions. The predecode unit predecodes the instructions prior to their storage within an instruction cache. In one system, a predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to dispatch the variable byte-length instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. By utilizing the predecode information from the predecode unit, the instruction alignment unit may be implemented with a relatively small number of cascaded levels of logic gates, thus accommodating very high frequencies of operation. Instruction alignment to decode units may further be accomplished with relatively few pipeline stages. Finally, since the predecode unit is configured such that the meaning of the functional bit of a particular predecode tag is dependent upon the status of the start bit, a relatively large amount of predecode information may be conveyed with a relatively small number of predecode bits. This thereby allows a reduction in the size of the instruction cache without compromising performance.
Abstract:
A dependency checking structure is provided which compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes. If a read memory access performed by the decode stage is dependent upon a write memory access performed by the execution stage, then the instruction associated with the read memory access and subsequent instructions are flushed. Data coherency is maintained between the pair of caches while allowing stack-relative accesses to be performed from the decode stage. The comparator circuits used to perform the comparison are configured to compare a field of address bits instead of the entire address, reducing the size while still maintaining accurate dependency checking by qualifying the resulting comparison signals with an indication that both addresses hit in the same storage location within the stack cache.
Abstract:
An apparatus for aligning variable byte length instructions to a plurality of issue positions is provided. The apparatus includes a byte queue divided into several subqueues. Each subqueue is maintained such that a first instruction in program order within the subqueue is identified by information stored in a first position within the subqueue, a second instruction in program order within the subqueue is identified by information stored in a second position within the subqueue, etc. When instructions from a subqueue are dispatched, remaining instructions within the subqueue are shifted such that the first of the remaining instructions (in program order) occupies the first position, etc. Instructions are shifted from subqueue to subqueue when each of the instructions within a particular subqueue have been dispatched. The information stored in one subqueue is shifted as a unit to another subqueue independent of the internal shifting of subqueue information. The subqueues are additionally configured to handle instructions which overflow from a first subqueue into a second subqueue. Information pertaining to the overflowing instructions is maintained in the last position within the first subqueue. The information is not shifted when other positions within the subqueue are shifted. In this manner, information regarding an overflowing instruction is again located in a limited number of positions.
Abstract:
An apparatus and method for resolving data dependencies among a plurality of instructions within a storage device, such as a reorder buffer in a superscalar computing apparatus employing pipeline instruction processing. The storage device has a read pointer, indicating a most recently-stored instruction and has a write pointer, indicating a first-stored instruction of the plurality of instructions within the storage device.A compare-hit circuit generates a compare-hit signal upon each concurrence of the respective source indicator in a next-to-be-dispatched instruction with the destination indicator of an earlier-stored instruction within the storage device; a first enable circuit generates a first enable signal for a first packet of instructions defined by the read pointer and the write pointer; a first comparing circuit generates a hit-enable signal for each concurrence of the compare-hit signal and the first enable signal; a second enable circuit generates a second enable signal for a second packet of instructions defined by the read pointer and the hit-enable signal; and a second comparing circuit generates the output signal for each concurrence of the second enable signal and the hit-enable signal.
Abstract:
In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset.