摘要:
A processor provides a register for storing an address space number (ASN). Operating system software may assign different ASNs to different processes. The processor may include a TLB to cache translations, and the TLB may record the ASN from the ASN register in a TLB entry being loaded. Thus, translations may be associated with processes through the ASNs. Generally, a TLB hit will be detected in an entry if the virtual address to be translated matches the virtual address tag and the ASN matches the ASN stored in the register. Additionally, the processor may use an indication from the translation table entries to indicate whether or not a translation is global. If a translation is global, then the ASN comparison is not included in detecting a hit in the TLB. Thus, translations which are used by more than one process may not occupy multiple TLB entries. Instead, a hit may be detected on the TLB entry storing the global translation even though the recorded ASN may not match the current ASN. In one embodiment, if ASNs are disabled, the TLB may be flushed on context switches. However, the indication from the translation table entries used to indicate that the translation is global may be used (when ASNs are disabled) by the TLB to selectively invalidate non-global translations on a context switch while not invalidating global translations.
摘要:
A processor includes a store queue configured to detect a hit on a store queue entry for a load being executed by the processor, and to forward data from the store queue entry to provide a result for the load. The store queue data is provided to the data cache, along with an indication of how much data is being provided (e.g. byte enables). The data cache may then fill in any additional data accessed by the load from cache data, and provide a load result. Additionally, the store queue is configured to detect if more than one store queue entry is hit (i.e. that more than one store within the store queue updates at least one byte accessed by the load), referred to as a multimatch. If a multimatch is detected, the store queue retries the load. Subsequently, the load may be reexecuted and may not multimatch (as entries are deleted upon completion of the corresponding stores). The load may complete when it does not multimatch. In one embodiment, the store queue independently detects hits on the upper and lower portions of each store queue entry (e.g. doubleword portions) and forwards from the upper and lower portions independently. Thus, a load may hit one store queue entry for the lower portion of the data accessed by the load and a different store queue entry for the upper portion of the data accessed by the load without multimatch detection.
摘要:
A processor includes a store queue and a store queue number assignment circuit. The store queue number assignment circuit assigns store queue numbers to stores, and operates upon instruction operations prior to the instruction operations reaching a point in the pipeline of the processor at which out of order instruction processing begins. Thus, store queue entries may be reserved for stores according to the program order of the stores. Additionally, in one embodiment, the store queue number identifying the youngest store represented in the store queue may be assigned to loads. In this manner, loads may determine which stores in the store queue are older or younger than the load based on relative position within the store queue. Checking for store queue hits may be qualified with the entries between the head of the store queue and the entry indicated by the load's store queue number. In one particular embodiment, the store queue number may include an additional “toggle” bit which is toggled each time the assignment of store queue numbers reaches the maximum store queue entry and wraps to zero. If the toggle bit of the store in the store queue entry identified by the load's store queue number differs from the toggle bit of the load's store queue number, than the store queue entry has been reassigned to a store younger than the load.
摘要:
A microprocessor with a floating point unit configured to rapidly execute floating point load control word (FLDCW) type instructions in an out of program order context is disclosed. The floating point unit is configured to schedule instructions older than the FLDCW-type instruction before the FLDCW-type instruction is scheduled. The FLDCW-type instruction acts as a barrier to prevent instructions occurring after the FLDCW-type instruction in program order from executing before the FLDCW-type instruction. Indicator bits may be used to simplify instruction scheduling, and copies of the floating point control word may be stored for instruction that have long execution cycles. A method and computer configured to rapidly execute FLDCW-type instructions in an out of program order context are also disclosed.
摘要:
A dynamic memory allocation routine maintains an allocation size cache which records the address of a most recently allocated memory block for each different size of memory block that has been allocated. Upon receiving a dynamic memory allocation request, the dynamic memory allocation routine determines if the requested size is equal to one of the sizes recorded in the allocation size cache. If a matching size is found, the dynamic memory allocation routine attempts to allocate a memory block contiguous to the most recently allocated memory block of that matching size. If the contiguous memory block has been allocated to another memory block, the dynamic memory allocation routine attempts to reserve a reserved memory block having a size which is a predetermined multiple of the requested size. The requested memory block is then allocated at the beginning of the reserved memory block. By reserving the reserved memory block, the dynamic memory allocation routine may increase the likelihood that subsequent requests for memory blocks having the requested size can be allocated in contiguous memory locations.
摘要:
An apparatus and method for superforwarding load operands in a microprocessor are provided. An execution unit in a microprocessor is configured to receive a load instruction and a subsequent instruction. If the load instruction corresponds to a simple load instruction, a destination operand of the load instruction can be superforwarded to a subsequent instruction if the subsequent instruction specifies a source operand that depends on the destination operand of the load instruction. The subsequent instruction is not required to wait until a load instruction executes or completes and can be scheduled and/or executed prior to or at the same time as the load instruction. Consequently, latencies associated with operand dependencies may be reduced.
摘要:
A microprocessor configured to rapidly execute floating point store status word (FSTSW) type instructions that are immediately preceded by floating point compare (FCOM) type instructions is disclosed. FCOM-type instructions are modified to store their results to an architectural floating point status word and a temporary destination register. If an FSTSW-type instruction is detected immediately following an FCOM-type instruction, then the FSTSW-type instruction is transformed into a special fast floating point store status word (FSTSWEF) instruction. Unlike the FSTSW-type instruction, which is serializing and negatively impacts performance, the FSTSWEF instruction is not serializing and allows execution to continue without undue serialization. A computer system and method for rapidly executing FSTSW instructions immediately preceded by FCOM-type instructions are also disclosed.
摘要:
A microprocessor assigns a data transaction type to each instruction. The data transaction type is based upon the encoding of the instruction, and indicates an access mode for memory operations corresponding to the instruction. The access mode may, for example, specify caching and prefetching characteristics for the memory operation. The access mode for each data transaction type is selected to enhance the speed of access by the microprocessor to the data, or to enhance the overall cache and prefetching efficiency of the microprocessor by inhibiting caching and/or prefetching for those memory operations. Instead of relying on data memory access patterns and overall program behavior to determine caching and prefetching operations, these operations are determined on an instruction-by-instruction basis. Additionally, the data transaction types assigned to different instruction encodings may be revealed to program developers. Program developers may use the instruction encodings (and instruction encodings which are assigned to a nil data transaction type causing a default access mode) to optimize use of processor resources during program execution.
摘要:
Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
摘要:
Methods and apparatuses for managing load-store dependencies in an out-of-order processor. A load store dependency predictor may include a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes a counter to indicate a strength of the dependency prediction. If the counter is above a threshold, a dependency is enforced for the load-store pair. If the counter is below the threshold, the dependency is not enforced for the load-store pair. When a store is dispatched, the table is searched, and any matching entries in the table are armed. If a load is dispatched, matches on an armed entry, and the counter is above the threshold, then the load will wait to issue until the corresponding store issues.