摘要:
A circuit comprising a number of range registers and complimentary decoding/matching circuits is provided to a processor for determining the memory type of a physical address, thereby allowing memory operating characteristics to be determined as soon as the physical address is available in an execution stage preceding cache access. Additionally, a memory type field is provided to each address translation lookaside buffer entry of the data and instruction memory subsystem for storing the determined memory type, and the memory type determination circuit is disposed in the page miss handler, thereby allowing memory type to be determined at the same time when the physical address is determined.
摘要:
A method and apparatus for dynamically allocating entries of microprocessor resources to particular instructions in an efficient manner to efficiently utilize buffer size and resources. The pipelined and superscalar microprocessor is capable of speculatively executing instructions and also out-of-order processing. Resources within the microprocessor include a store buffer, a load buffer, a reorder buffer and a reservation station. The reorder buffer contains a larger set of physical registers and also contains information related to speculative instructions and the reservation station comprises information related to instructions pending execution. The load buffer is only allocated to load instructions and is valid for an instruction from allocation pipestage to instruction retirement. The store buffer is only allocated to store instructions and is valid for an instruction from allocation to store performance. The reservation station is allocated to most instructions and is valid for an instruction from allocation to instruction dispatch. The reorder buffer is allocated to all instructions and is valid for a given instruction from allocation to retirement. The load buffer, store buffer, and reorder buffer are sequentially allocated while the reservation station is not. Resource allocation is performed dynamically (as needed by the operation) rather than as a full set of resources attached to each operation. Using the above allocation scheme, efficient usage of the microprocessor resources is accomplished.
摘要:
An allocator assigns entries for a circular buffer. The allocator receives requests for storing data in entries of the circular buffer, and generates a head pointer to identify a starting entry in the circular buffer for which circular buffer entries are not allocated. In addition to pointing to an entry in the circular buffer, the head pointer includes a wrap bit. The allocator toggles the wrap bit each time the allocator traverses the linear queue of the circular buffer. A tail pointer is generated, including the wrap bit, to identify an ending entry in the circular buffer for which circular buffer entries are allocated. In response to the request for entries, the allocator sequentially assigns entries for the requests located between the head pointer and the tail pointer. The allocator has application for use in a microprocessor performing out-of-order dispatch anti speculative execution. The allocator is coupled to a reorder buffer, configured as a circular buffer, to permit allocation of entries. The allocator utilizes an all or nothing allocation policy, such that either all or no incoming instructions are allocated during an allocation period.
摘要:
A circuit comprising a number of address range registers and complimentary decoding/matching circuits is provided to a processor for determining the memory type of a physical address, thereby allowing memory type to be determined as soon as the physical address is available in an execution stage preceding cache access. Additionally, a memory type field is provided to each address translation lookaside buffer entry of the data and instruction memory subsystem for storing the determined memory type. The memory type determination circuit is disposed in the page miss handler, thereby allowing memory type to be determined at the same time when the physical address is determined.
摘要:
In a microprocessor, an apparatus and method for performing memory functions and issuing bus cycles. Special microinstructions are stored in microcode ROM. These microinstructions are used to perform the memory functions and to generate the special bus cycles. Initially, an address corresponding to a requested operation to be performed is generated for one of these special microinstructions. That special microinstruction, along with its address, is then transmitted over the bus to the various units of the microprocessor. When each of the units receives the microinstruction, it determines whether that microinstruction is to be ignored based on the address. If a particular unit ignores the microinstruction, the microinstruction is forwarded to subsequent units in the pipeline for processing. Otherwise, if that particular unit performs the requested operation as specified by the microinstruction's address.
摘要:
A branch prediction mechanism that maintains both speculative history and actual history for each branch instruction in a branch target buffer. The actual branch history contains the branch history for fully resolved occurrences of the branch instruction. The speculative branch history contains the actual history plus the "history" of recent branch predictions for the branch. If the speculative branch history contains any recent predictions, then a speculation bit is set. When the speculation bit is set, this indicates that there is speculative history for a branch. Therefore, when the speculation bit is set the speculative history is used to make branch predictions. If a misprediction is made for the branch, the speculation bit is cleared since the speculative history contains inaccurate branch history.
摘要:
A method and apparatus for executing a misaligned load. The method begins with receiving a load request to load data from a first memory location. An entry in a store buffer is tested to determine whether the entry corresponds to the first memory location. The entry is also tested to determine whether the entry corresponds to a second memory location subsequent to the first memory location. The load request is blocked if the entry corresponds to the first memory location or the second memory location. After a store operation for the store buffer entry is executed, the load request may be unblocked. The apparatus is a processor or a computer system comprising a load buffer capable of storing a load request address in response to a load request. The processor includes an incrementing circuit that generates an incremented load request address. The processor also includes a store buffer containing a portion of a store request address. The store buffer includes comparison circuitry that compares the portion of the store request address to the load request and the incremented load request address, and generates a blocking signal if the either of the load request address and the incremented load request correspond to the store request address.
摘要:
A mechanism and method for renaming flags within a register alias table ("RAT") to increase processor parallelism and also providing and using flag masks associated with individual instructions. In order to reduce the amount of data dependencies between instructions that are concurrently processed, the flags used by these instructions are renamed. In general, a RAT unit provides register renaming to provide a larger physical register set than would ordinarily be available within a given macroarchitecture's logical register set (such as the Intel architecture or PowerPC or Alpha designs, for instance) to eliminate false data dependencies between instructions that reduce overall superscalar processing performance for the microprocessor. The renamed flag registers contain several flag bits and various flag bits may be updated or read by different instructions. Also, static and dynamic flag masks are associated with particular instructions and indicate which flags are capable of being updated by a particular instruction and also indicate which flags are actually updated by the instruction. Static flag masks are used in flag renaming and dynamic flag masks are used at retirement. The invention also discovers cases in which a flag register is required that is a superset of the previously renamed flag register portion.
摘要:
A translation lookaside buffer is described for use with a microprocessor capable of speculative and out-of-order processing of memory instructions. The translation lookaside buffer is non-blocking in response to translation lookaside buffer misses requiring page table walks. Once a translation lookaside buffer miss is detected, a page table walk is initiated to satisfy the miss. During the page table walk, additional memory instructions are processed by the translation lookaside buffer. Any additional instructions which cause translation lookaside buffer hits are merely processed by the translation lookaside buffer. However, instructions causing translation lookaside buffer misses while the page table walk is being performed are blocked pending completion of the page table walk. Once the page table walk is completed the blocked instructions are reawakened and are again processed by the translation lookaside buffer. Global and selective wakeup mechanisms are described. An implementation wherein the non-blocking translation lookaside buffer is provided within a microprocessor capable of speculative and out-of-order processing is also described.
摘要:
A memory-type value identifying the type of memory contained with a range of memory locations is explicitly stored within a microprocessor. Prior to processing a memory micro-instruction such as a load or store, the memory-type is determined for the memory location identified by the memory micro-instruction. Once the memory-type is known the memory micro-instruction is processed in accordance with any one of a number of processing protocols including write-through processing, write-back processing, write-protect processing, restricted-cacheability processing, uncacheable speculatable write-combining processing, or uncacheable processing. By providing memory-type information explicitly within the microprocessor, the type of memory identified by a micro-instruction is known before the micro-instruction is processed. Accordingly, the protocol by which the micro-instruction is processed may be efficiently tailored to the memory-type. For example, if the memory location identified by the micro-instruction is known to be uncacheable, a data cache unit is bypassed and external memory is accessed directly. In an exemplary embodiment, the microprocessor is an out-of-order microprocessor capable of generating speculative memory micro-instruction. Also, the microprocessor may be only one of a number of microprocessors within a multiprocessor system.