Abstract:
A method for stopping replay tornadoes in a processor. The method of one embodiment comprises scheduling an instruction for execution speculatively. A determination is made whether the instruction executed correctly. The instruction is routed to a replay mechanism if the instruction did not execute correctly. A determination is made whether a replay tornado exists. The instruction is routed for re-execution if the instruction executed incorrectly and no replay tornado exists. Breaking the replay tornado if the replay tornado exists. Replay safe instructions in the pipeline are retired. Non-replay safe instructions in the pipeline are marked for re-execution. The non-replay safe instructions are rescheduled for re-execution.
Abstract:
A system and method for optimizing a series of traces to be executed by a processing core is disclosed. The lines of a trace are sent to an optimizer each time they are sent to a processing core to be executed. Runtime information may be collected on a line of a trace each time that trace is executed by a processing core. The runtime information may be used by the optimizer to better optimize the micro-operations of the lines of the trace. The optimizer optimizes a trace each time the trace is executed to improve the efficiency of future iterations of the trace. Most of the optimizations result in a reduction of the number of μops within the trace. The optimizer may optimize two or more lines at a time in order to find more opportunities to remove μops and shorten the trace. The two lines may be alternately offset so that each line has the maximum allowed number of micro-operations.
Abstract:
A method and apparatus for enabling an adaptive replay loop in a processor. More particularly, the present invention relates to allowing instructions in the replay loop to change its relative position, thereby decreasing the latency for execution of instructions, resolving dynamic resource conflicts, and also increasing the overall efficiency of the processor.
Abstract:
Embodiments of the present invention relate to a memory management scheme and apparatus that enables efficient memory renaming. The method includes computing a store address, writing the store address in a first storage, writing data associated with the store address to a memory, and de-allocating the store address from the first storage, allocating the store address in a second storage, predicting a load instruction to be memory renamed, computing a load store source index, computing a load address, disambiguating the memory renamed load instruction, and retiring the memory renamed load instruction, if the store instruction is still allocated in at least one of the first storage and the second storage and should have effectively provided to the load the full data. The method may also include re-executing the load instruction without memory renaming, if the store instruction is not in at the first storage or in the second storage.
Abstract:
Embodiments of the present invention relate to a method and system for providing virtual identifiers corresponding to physical registers in a computer processor. According to the embodiments, the virtual identifiers may be used to represent the physical registers during operations in a pipeline of the processor.
Abstract:
Embodiments of the present invention relate to a system and method for associating a register file cache with a register file in a computer processor.
Abstract:
In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
Abstract:
Methods and apparatuses for reducing power consumption of processor switch operations are disclosed. One or more embodiments may comprise specifying a subset of registers or state storage elements to be involved in a register or state storage operation, performing the register or state storage operation, and performing a switch operation. The embodiments may minimize the number of registers or state storage elements involved with the standby operation by specifying only the subset of registers or state storage elements, which may involve considerably fewer than the total number of registers or state storage or elements of the processor. The switch operation may be switch from one mode to another, such as a transition to or from a sleep mode, a context switch, or the execution of various types of instructions.
Abstract:
A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values.
Abstract:
A technique for using memory attributes to relay information to a program or other agent. More particularly, embodiments of the invention relate to using memory attribute bits to check various memory properties in an efficient manner.