Abstract:
A computer-readable storage medium, method and system for optimization-level aware branch prediction is described. A gear level is assigned to a set of application instructions that have been optimized. The gear level is also stored in a register of a branch prediction unit of a processor. Branch prediction is then performed by the processor based upon the gear level.
Abstract:
A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
Abstract:
Systems, methods, and apparatuses for decomposing a sequential program into multiple threads, executing these threads, and reconstructing the sequential execution of the threads are described. A plurality of data cache units (DCUs) store locally retired instructions of speculatively executed threads. A merging level cache (MLC) merges data from the lines of the DCUs. An inter-core memory coherency module (ICMC) globally retire instructions of the speculatively executed threads in the MLC.
Abstract:
A processing device implementing silent memory instructions and miss-rate tracking to optimize switching policy on threads is disclosed. A processing device of the disclosure includes a branch prediction unit (BPU) to predict that an instruction of a first thread in a current execution context of the processing device is a delinquent instruction, indicate that the first thread including the delinquent instruction is in a silent execution mode, indicate that the delinquent instruction is to be executed as a silent instruction, switch an execution context of the processing device to a second thread, and when the execution context returns to the first thread, cause the delinquent instruction to be re-executed as a regular instruction.
Abstract:
Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
Abstract:
In one embodiment, the present invention includes a method for accessing registers associated with a first thread while executing a second thread. In one such embodiment a method may include preventing an instruction of a first thread that is to access a source operand from a register file of a second thread from executing if a synchronization indicator associated with the source operand indicates incompletion of a producer operation of the second thread, and executing the instruction if the synchronization indicator indicates completion of the producer operation of the second thread. Other embodiments are described and claimed.
Abstract:
Techniques for achieving coherence between dynamically optimized code and original code are disclosed. In an embodiment, a search is performed for a first entry for a first page containing a first code region in a first data structure. The first code is used to determine whether a first indicator in the first entry is set to a first value. The first entry is added to the first data structure, in response to failing to find the first entry in the first data structure. A second search may be performed for a second entry for the first code region in a second data structure, in response to determining that the first indicator is set to the first value. Other embodiments are also disclosed and claimed.
Abstract:
Methods and apparatus are disclosed for using a register checkpointing mechanism to resolve multithreading mis-speculations. Valid architectural state is recovered and execution is rolled back. Some embodiments include memory to store checkpoint data. Multiple thread units concurrently execute threads. They execute a checkpoint mask instruction to initialize memory to store active checkpoint data including register contents and a checkpoint mask indicating the validity of stored register contents. As register contents change, threads execute checkpoint write instructions to store register contents and update the checkpoint mask. Threads also execute a recovery function instruction to store a pointer to a checkpoint recovery function, and in response to mis-speculation among the threads, branch to the checkpoint recovery function. Threads then execute one or more checkpoint read instructions to copy data from a valid checkpoint storage area into the registers necessary to recover a valid architectural state, from which execution may resume.
Abstract:
An apparatus comprising a first search logic to search for a first entry for a first page containing a first code region in a first data structure to determine whether a first indicator in the first entry is set to a first value; an adder logic to add the first entry to the first data structure, in response to failing to find the first entry in the first data structure; a second search logic to search for a second entry for the first code region in a second data structure, in response to determining that the first indicator is set to the first value, wherein one or more optimized code regions corresponding to the first page from a code cache are to be removed in response to determining that the first page may have been modified, and wherein the first indicator is to be set to a second value.
Abstract:
A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.