摘要:
In one embodiment, the present invention includes a method for determining if an instruction of a first thread dispatched from a first queue associated with the first thread is stalled in a pipestage of a pipeline, and if so, dispatching an instruction of a second thread from a second queue associated with the second thread to the pipeline if the second thread is not stalled. Other embodiments are described and claimed.
摘要:
In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.
摘要:
A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
摘要:
In one embodiment, the present invention includes a method for providing a cache block in an exclusive state to a first cache and providing the same cache block in the exclusive state to a second cache when cores accessing the two caches are executing redundant threads. Other embodiments are described and claimed.
摘要:
According to one embodiment, a method is disclosed. The method includes receiving a value at a vector length (VL) tracker and establishing a VL for subsequent micro-operations (μops) that are to be executed corresponding to the value.
摘要:
In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (μop) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized μop flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
摘要:
In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
摘要:
Methods and apparatus to provide wakeup mechanisms for schedulers are described. In one embodiment, a scheduler broadcasts a uop scheduler identifier of a scheduled uop (or micro-operation) to one or more uops awaiting scheduling. The scheduler may further update one or more corresponding entries in a uop dependency matrix or a uop source identifiers and data buffer.
摘要:
Embodiments of the present invention relate to a system and method for implementing functions of a register translation table of a computer processor, with reduced area requirements as compared to known arrangements.
摘要:
Microarchitecture policies and structures to predict execution clusters and facilitate inter-cluster communication are disclosed. In disclosed embodiments, sequentially ordered instructions are decoded into micro-operations. Execution of one set of micro-operations is predicted to involve execution resources to perform memory access operations and inter-cluster communication, but not to perform branching operations. Execution of a second set of micro-operations is predicted to involve execution resources to perform branching operations but not to perform memory access operations. The micro-operations are partitioned for execution in accordance with these predictions, the first set of micro-operations to a first cluster of execution resources and the second set of micro-operations to a second cluster of execution resources. The first and second sets of micro-operations are executed out of sequential order and are retired to represent their sequential instruction ordering.