摘要:
A micro-operation (uop) fusion technique. More particularly, embodiments of the invention relate to a technique to fuse two or more uops originating from two or more instructions.
摘要:
A processor includes an instruction decoder to decode instructions into micro-operations for execution. The instruction decoder may include a programmable logic array to store templates to be addressed by instructions during decoding of the instructions. A collapsed template is addressed by one or more instructions during decoding into fused micro-operations and by one or more instructions during decoding into simple micro-operations. The instruction decoder may also include a multiplexer to select values of a field of the micro-operation based at least on an indication that the instruction being decoded is not being decoded into a simple micro-operation. The instruction decoder may also include a multiplexer to select values of a field of the micro-operation based at least on bits of a template field, where the number of bits of the template field is less than the number of bits of the field of the micro-operation.
摘要:
A micro-operation (uop) fusion technique. More particularly, embodiments of the invention relate to a technique to fuse two or more uops originating from two or more instructions.
摘要:
Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.
摘要:
Methods and apparatus for instruction restarts and inclusion in processor micro-op caches are disclosed. Embodiments of micro-op caches have way storage fields to record the instruction-cache ways storing corresponding macroinstructions. Instruction-cache in-use indications associated with the instruction-cache lines storing the instructions are updated upon micro-op cache hits. In-use indications can be located using the recorded instruction-cache ways in micro-op cache lines. Victim-cache deallocation micro-ops are enqueued in a micro-op queue after micro-op cache miss synchronizations, responsive to evictions from the instruction-cache into a victim-cache. Inclusion logic also locates and evicts micro-op cache lines corresponding to the recorded instruction-cache ways, responsive to evictions from the instruction-cache.
摘要:
Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.
摘要:
A method for renaming a source for use with a processor, the method including providing an instruction, building instruction dependency information based on the instruction, caching the instruction based on the instruction dependency information to provide a cached instruction, renaming a register based on the cached instruction to provide a renamed register, and multiplexing the instruction dependency information and the renamed register to rename the source.
摘要:
Methods and apparatuses for distributing architectural state information in a processor across multiple pipeline stages are described. An architectural value of a register is represented by a historical value added to an update value which is maintained in a non-final pipeline stage. When an instruction requires the architectural value, a calculation is made and that value is inserted into the pipeline for processing. Recovery of both pre- and post-execution architectural state information is made possible by storing both the update value and the operation to take place on that value for each decoded instruction.
摘要:
Methods and apparatus for inclusion of TLB (translation look-aside buffer) in processor micro-op caches are disclosed. Some embodiments for inclusion of TLB entries have micro-op cache inclusion fields, which are set responsive to accessing the TLB entry. Inclusion logic may the flush the micro-op cache or portions of the micro-op cache and clear corresponding inclusion fields responsive to a replacement or invalidation of a TLB entry whenever its associated inclusion field had been set. Front-end processor state may also be cleared and instructions refetched when replacement resulted from a TLB miss.
摘要:
Methods and apparatus for inclusion of TLB (translation look-aside buffer) in processor micro-op caches are disclosed. Some embodiments for inclusion of TLB entries have micro-op cache inclusion fields, which are set responsive to accessing the TLB entry. Inclusion logic may the flush the micro-op cache or portions of the micro-op cache and clear corresponding inclusion fields responsive to a replacement or invalidation of a TLB entry whenever its associated inclusion field had been set. Front-end processor state may also be cleared and instructions refetched when replacement resulted from a TLB miss.