摘要:
Embodiments of the present invention relate to a system and method for associating a register file cache with a register file in a computer processor.
摘要:
Embodiments of the present invention relate to a system and method for implementing functions of a register translation table of a computer processor, with reduced area requirements as compared to known arrangements.
摘要:
A system may include M cache entries, each of the M cache entries to transmit a signal indicating a read from or a write to the cache entry and comprising a data register and a memory address register, and K layers of decision cells, where K=log2M. The K layers M/2 decision cells of a first layer to indicate the other one of the respective two of the M cache entries and to transmit a hit signal in response to the signal, a second layer of M/4 decision cells to enable the other one of the respective two of the M/2 decision cells of the first layer and transmit a second hit signal in response to the signal, a (K-1)th layer of two decision cells to enable the other one of the respective two decision cells of the (K-2)th layer and transmit a third hit signal in response to the second hit signal, and a Kth layer of a root decision cell to enable the other one of the respective two decision cells of the (K-1)th layer in response to the third hit signal.
摘要:
Microarchitecture policies and structures to predict execution clusters and facilitate inter-cluster communication are disclosed. In disclosed embodiments, sequentially ordered instructions are decoded into micro-operations. Execution of one set of micro-operations is predicted to involve execution resources to perform memory access operations and inter-cluster communication, but not to perform branching operations. Execution of a second set of micro-operations is predicted to involve execution resources to perform branching operations but not to perform memory access operations. The micro-operations are partitioned for execution in accordance with these predictions, the first set of micro-operations to a first cluster of execution resources and the second set of micro-operations to a second cluster of execution resources. The first and second sets of micro-operations are executed out of sequential order and are retired to represent their sequential instruction ordering.
摘要:
Embodiments of the present invention provide a method, apparatus and system for memory renaming. In one embodiment, a decode unit may decode a load instruction. If the load instruction is predicted to be memory renamed, the load instruction may have a predicted store identifier associated with the load instruction. The decode unit may transform the load instruction that is predicted to be memory renamed into a data move instruction and a load check instruction. The data move instruction may read data from the cache based on the predicted store identifier and load check instruction may compare an identifier associated with an identified source store with the predicted store identifier. A retirement unit may retire the load instruction if the predicted store identifier matches an identifier associated with the identified source store. In another embodiment of the present invention, the processor may re-execute the load instruction without memory renaming if the predicted store identifier does not match the identifier associated with the identified source store.
摘要:
A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
摘要:
According to one embodiment, a method is disclosed. The method includes receiving a value at a vector length (VL) tracker and establishing a VL for subsequent micro-operations (μops) that are to be executed corresponding to the value.
摘要:
In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (μop) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized μop flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
摘要:
In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
摘要:
Microarchitecture policies and structures partition execution resource clusters. In disclosed microarchitecture embodiments, micro-operations representing a sequential instruction ordering are partitioned into a two sets. To one set of micro-operations execution resources are allocated from a cluster of execution resources that can perform memory access operations but not branching operations. To the other set of micro-operations execution resources are allocated from a cluster of execution resources that can perform branching operations but not memory access operations. The first and second sets of micro-operations may be executed out of sequential order but are retired to represent their sequential instruction ordering.