摘要:
In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (μop) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized μop flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
摘要:
According to one embodiment, a method is disclosed. The method includes receiving a value at a vector length (VL) tracker and establishing a VL for subsequent micro-operations (μops) that are to be executed corresponding to the value.
摘要:
In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
摘要:
In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
摘要:
In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (μop) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized μop flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
摘要:
According to an embodiment of the invention, a method and apparatus for flag value renaming. An embodiment of a method comprises setting a flag for a processor via a first instruction, the first instruction being either a direct update instruction or an indirect update instruction; if the setting of the flag is by a direct update instruction, executing a succeeding second instruction that reads the flag prior to completion of the first instruction; and if the setting of the flag is by an indirect update instruction, delaying the second instruction until after completion of the first instruction.
摘要:
Microarchitecture policies and structures to predict execution clusters and facilitate inter-cluster communication are disclosed. In disclosed embodiments, sequentially ordered instructions are decoded into micro-operations. Execution of one set of micro-operations is predicted to involve execution resources to perform memory access operations and inter-cluster communication, but not to perform branching operations. Execution of a second set of micro-operations is predicted to involve execution resources to perform branching operations but not to perform memory access operations. The micro-operations are partitioned for execution in accordance with these predictions, the first set of micro-operations to a first cluster of execution resources and the second set of micro-operations to a second cluster of execution resources. The first and second sets of micro-operations are executed out of sequential order and are retired to represent their sequential instruction ordering.
摘要:
Embodiments of the present invention provide a method, apparatus and system for memory renaming. In one embodiment, a decode unit may decode a load instruction. If the load instruction is predicted to be memory renamed, the load instruction may have a predicted store identifier associated with the load instruction. The decode unit may transform the load instruction that is predicted to be memory renamed into a data move instruction and a load check instruction. The data move instruction may read data from the cache based on the predicted store identifier and load check instruction may compare an identifier associated with an identified source store with the predicted store identifier. A retirement unit may retire the load instruction if the predicted store identifier matches an identifier associated with the identified source store. In another embodiment of the present invention, the processor may re-execute the load instruction without memory renaming if the predicted store identifier does not match the identifier associated with the identified source store.
摘要:
A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
摘要:
Microarchitecture policies and structures partition execution resource clusters. In disclosed microarchitecture embodiments, micro-operations representing a sequential instruction ordering are partitioned into a two sets. To one set of micro-operations execution resources are allocated from a cluster of execution resources that can perform memory access operations but not branching operations. To the other set of micro-operations execution resources are allocated from a cluster of execution resources that can perform branching operations but not memory access operations. The first and second sets of micro-operations may be executed out of sequential order but are retired to represent their sequential instruction ordering.