摘要:
A microprocessor assigns a data transaction type to each instruction. The data transaction type is based upon the encoding of the instruction, and indicates an access mode for memory operations corresponding to the instruction. The access mode may, for example, specify caching and prefetching characteristics for the memory operation. The access mode for each data transaction type is selected to enhance the speed of access by the microprocessor to the data, or to enhance the overall cache and prefetching efficiency of the microprocessor by inhibiting caching and/or prefetching for those memory operations. Instead of relying on data memory access patterns and overall program behavior to determine caching and prefetching operations, these operations are determined on an instruction-by-instruction basis. Additionally, the data transaction types assigned to different instruction encodings may be revealed to program developers. Program developers may use the instruction encodings (and instruction encodings which are assigned to a nil data transaction type causing a default access mode) to optimize use of processor resources during program execution.
摘要:
Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
摘要:
Methods and apparatuses for managing load-store dependencies in an out-of-order processor. A load store dependency predictor may include a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes a counter to indicate a strength of the dependency prediction. If the counter is above a threshold, a dependency is enforced for the load-store pair. If the counter is below the threshold, the dependency is not enforced for the load-store pair. When a store is dispatched, the table is searched, and any matching entries in the table are armed. If a load is dispatched, matches on an armed entry, and the counter is above the threshold, then the load will wait to issue until the corresponding store issues.
摘要:
A processor includes execution resources for handling a first memory operation and a concurrent second memory operation. If one of the memory operations is misaligned, the processor may allocate the execution resources for the other memory operation to that memory operation. In one embodiment, the older memory operation proceeds if misalignment is detected. The younger memory operation is retried and may be reexecuted at a later time. If the older memory operation is misaligned, the execution resources provided for the younger operation may be allocated to the older memory operation. If only the younger memory operation is misaligned, the younger memory operation may be the older memory operation during a subsequent reexecution and may thus be allocated the execution resources to allow the memory operation to complete.
摘要:
A processor provides a register for storing an address space number (ASN). Operating system software may assign different ASNs to different processes. The processor may include a TLB to cache translations, and the TLB may record the ASN from the ASN register in a TLB entry being loaded. Thus, translations may be associated with processes through the ASNs. Generally, a TLB hit will be detected in an entry if the virtual address to be translated matches the virtual address tag and the ASN matches the ASN stored in the register. Additionally, the processor may use an indication from the translation table entries to indicate whether or not a translation is global. If a translation is global, then the ASN comparison is not included in detecting a hit in the TLB. Thus, translations which are used by more than one process may not occupy multiple TLB entries. Instead, a hit may be detected on the TLB entry storing the global translation even though the recorded ASN may not match the current ASN. In one embodiment, if ASNs are disabled, the TLB may be flushed on context switches. However, the indication from the translation table entries used to indicate that the translation is global may be used (when ASNs are disabled) by the TLB to selectively invalidate non-global translations on a context switch while not invalidating global translations.
摘要:
A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.
摘要:
Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
摘要:
A method is provided for architectural integrated circuit power estimation. The method may include receiving a plurality of respective energy events, receiving a plurality of base-level energy models, and generating a plurality of power models. Each power model may hierarchically instantiate one or more of the base-level energy models. The method may further include mapping each respective energy event to one or more of the plurality of power models. The method may further include hierarchically evaluating a particular base-level energy model corresponding to a given respective energy event, estimating an energy associated with evaluation of the particular base-level energy model, and accumulating the energy in a power estimate corresponding to the given respective energy event.
摘要:
A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete. The canceled instruction is then “replayed” during a later clock cycle from the reissue buffer. A method for performing denormal loads and a computer system are also disclosed.
摘要:
A microprocessor includes one or more registers which are architecturally defined to be used for at least two data formats. In one embodiment, the registers are the floating point registers defined in the x86 architecture, and the data formats are the floating point data format and the multimedia data format. The registers actually implemented by the microprocessor for the floating point registers use an internal format for floating point data. Part of the internal format is a classification field which classifies the floating point data in the extended precision defined by the x86 microprocessor architecture. Additionally, a classification field encoding is reserved for multimedia data. As the microprocessor begins execution of each multimedia instruction, the classification information of the source operands is examined to determine if the data is either in the multimedia class, or in a floating point class in which the significand portion of the register is the same as the corresponding significand in extended precision. If so, the multimedia instruction executes normally. If not, the multimedia instruction is faulted. Similarly, as the microprocessor begins execution of each floating point instruction, the classification information of the source operands is examined. If the data is classified as multimedia, the floating point instruction is faulted. A microcode routine is used to reformat the data stored in at least the source registers of the faulting instruction into a format useable by the faulting instruction. Subsequently, the faulting instruction is re-executed.