Abstract:
A superscalar microprocessor implements a reorder buffer to support out-of-order execution of instructions. To reduce the time delay for identifying mispredicted instructions, prioritizing mispredicted instructions, canceling instructions subsequent to the mispredicted instruction and reading status information from the reorder buffer, the availability of an instruction tag, which identifies the instruction being executed, during the execution of the instruction is utilized. The reorder buffer receives the tag of the instruction issued to the functional unit. In parallel with the execution of the instruction, the reorder buffer generates hit masks identifying instructions to be canceled in the event of a mispredicted branch. In parallel, status information from the instruction (or instructions) being executed is selected from the reorder buffer and prioritization masks are generated. Therefore, if a mispredicted branch is detected, the instructions that need to be canceled can be readily identified and the instruction status information is readily available.
Abstract:
A first hit scanning circuit scans hit signals identifying entries within a rotating buffer which are storing a search value for a first of the hit signals which is nearest a start pointer which identifies one of the entries. The first hit scanning circuit divides the hit signals into multiple subsets, and independently scans each subset for a first hit within the subset. In parallel, the first hit scanning circuit generates a set of lookahead signals by scanning each subset for at least one hit. The lookahead signals are then scanned for a first lookahead signal, and the scanned subset signals are qualified with the scanned lookahead signals.
Abstract:
The pipeline of a microprocessor is partitioned near its mid point such that a first portion of the functionality of the microprocessor is implemented on a first integrated circuit chip and a second portion of the microprocessor functionality is implemented on a second integrated circuit chip. In one implementation, the first integrated circuit chip includes an instruction cache, an instruction alignment unit, and a plurality of decode units for implementing fetch, alignment and decode stages, respectively, of the processor pipeline. Instructions are selected from the instruction cache by the instruction alignment unit and are provided to a respective decode unit. A compression unit may compress the information output by the decode units to prepare conveyance of the information from the first integrated chip to the second integrated circuit chip. The second integrated circuit chip contains circuitry to implement execute and write-back stages of the processor pipeline. This circuitry may include a plurality of execution units coupled to receive output signals from the decoders of the first integrated circuit chip, corresponding reservation stations, a load/store unit and a data cache. A decompression unit may be coupled to receive the compressed information from the compression unit of the first integrated circuit chip to decompress the information prior to providing it to the reservation stations and/or execution units.
Abstract:
A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.
Abstract:
A storage device having varying access times is provided. The storage device incorporates a direct-mapped cache and a set-associative cache, which are accessed in parallel. If a hit occurs in the direct-mapped cache, then the data is forwarded in the same clock cycle as the requested address is conveyed to the storage device. If a hit occurs in the set-associative cache, then the data is forwarded in a subsequent clock cycle and the associated cache line is moved into the direct-mapped cache. The cache line stored in the direct-mapped cache in the storage location that is to be used for the cache line being moved is stored into the set-associative cache in the location vacated by the moved line. In this manner, the most recently accessed cache line is stored in the direct-mapped cache and other recently accessed cache lines are stored in the set-associative cache.
Abstract:
A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file.
Abstract:
A microprocessor having a microcode unit is provided. Routines comprising DSP functions and instruction emulation routines are stored within a read-only memory within the microcode unit. The routines may be fetched by the microprocessor upon occurrence of a corresponding instruction. For example, DSP functions may be fetched upon occurrence of an instruction defined by the microprocessor to be indicative of a DSP function. The microcode unit provides a library of useful functions. Effectively, the instruction set executed by the microprocessor is increased. A number of methods for defining instructions indicative of a DSP function are contemplated. For example, a subroutine call instruction having a target address within a predefined range of addresses may be defined as indicative of a DSP function. Alternatively, a special subroutine call instruction may be added to the instruction set. Detection of the special subroutine call instruction encoding causes the microprocessor to fetch instructions from the microcode unit. A third alternative is to detect data patterns in data movement instructions and cause instructions to be fetched from the microcode unit upon occurrence of particular data patterns.
Abstract:
A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes. If a read memory access performed by the decode stage is dependent upon a write memory access performed by the execution stage, then the instruction associated with the read memory access and subsequent instructions are flushed. Data coherency is maintained between the pair of caches while allowing stack-relative accesses to be performed from the decode stage. The comparator circuits used to perform the comparison are configured to compare a field of address bits instead of the entire address, reducing the size while still maintaining accurate dependency checking by qualifying the resulting comparison signals with an indication that both addresses hit in the same storage location within the stack cache.
Abstract:
A microprocessor is configured to speculatively fetch cache lines of instruction bytes prior to actually detecting a cache miss for the cache lines of instruction bytes. The bytes transferred from an external main memory subsystem are stored into one of several prefetch buffers. Subsequently, instruction fetches may be detected which hit the prefetch buffers. Furthermore, predecode data may be generated for the instruction bytes stored in the prefetch buffers. When a fetch hit in the prefetch buffers is detected, predecode data may be available for the instructions being fetched. The prefetch buffers may each comprise an address prefetch buffer included within an external interface unit and an instruction data prefetch buffer included within a prefetch/predecode unit. The external interface unit maintains the addresses of cache lines assigned to the prefetch buffers in the address prefetch buffers. Both the linear address and the physical address of each cache line is maintained. The prefetch/predecode unit receives instruction bytes directly from the external interface and stores the instruction bytes in the corresponding instruction data prefetch buffer.
Abstract:
A load/store buffer is provided which allows both load memory operations and store memory operations to be stored within it. Because each storage location may contain either a load or a store memory operation, the number of available storage locations for load memory operations is maximally the number of storage locations in the entire buffer. Similarly, the number of available storage locations for store memory operations is maximally the number of storage locations in the entire buffer. This invention improves use of silicon area for load and store buffers by implementing, in a smaller area, a performance-equivalent alternative to the separate load and store buffer approach previously used in many superscalar microprocessors.