摘要:
An apparatus and method for profiling program code. In particular, an apparatus according to one embodiment comprises a filtering component identifying a first set of instructions for which profiling is desired wherein, in response to detecting that an instruction has been retired, the filtering component determines whether the instruction is within the first set of instructions for which profiling is desired; an event selection component detecting an event in response to the instruction retiring, the event selection component generating event signals in response to a designated event; and a profiling component recording the occurrence or not occurrence of the event within a first storage device responsive to signals from the filtering component and/or the event selection component.
摘要:
Techniques for implementing identification and management of unsafe optimizations are disclosed. A method of the disclosure includes receiving, by a managed runtime environment (MRE) executed by a processing device, a notice of misprediction of optimized code, the misprediction occurring during a runtime of the optimized code, determining, by the MRE, whether a local misprediction counter (LMC) associated with a code region of the optimized code causing the misprediction exceeds a local misprediction threshold (LMT) value, and when the LMC exceeds the LMT value, compiling, by the MRE, native code of the optimized code to generate a new version of the optimized code, wherein the code region in the new version of the optimized code is not optimized.
摘要:
Techniques for implementing identification and management of unsafe optimizations are disclosed. A method of the disclosure includes receiving, by a managed runtime environment (MRE) executed by a processing device, a notice of misprediction of optimized code, the misprediction occurring during a runtime of the optimized code, determining, by the MRE, whether a local misprediction counter (LMC) associated with a code region of the optimized code causing the misprediction exceeds a local misprediction threshold (LMT) value, and when the LMC exceeds the LMT value, compiling, by the MRE, native code of the optimized code to generate a new version of the optimized code, wherein the code region in the new version of the optimized code is not optimized.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The analysis may be practiced via software in a compiler, binary optimizer, standalone modeler, or the like. The analysis may include determining a predicted execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. The method is further to select a spawning pair, according to a greedy approach, if the spawning pair provides a performance enhancement, in terms of decreased execution time due to increased parallelism, when the speculative thread is spawned during execution of a code sequence. Other embodiments are also described and claimed.
摘要:
According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.
摘要:
Disclosed are selected embodiments of a processor that may include a plurality of thread units and a register file architecture to support speculative multithreading. For at least one embodiment, live-in values for a speculative thread are computed via execution of a precomputation slice and are stored in a validation buffer for later validation. A global register file holds the committed architecture state generated by a non-speculative thread. Each thread unit includes a local register file. A directory indicates, for each architectural register, which speculative thread(s) have generated a value for the architectural register. Other embodiments are also described and claimed.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The method, which may be practiced via software in a compiler or standalone modeler, determines execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. Other embodiments are also described and claimed.
摘要:
According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.
摘要:
Disclosed are selected embodiments of a processor that may include a plurality of thread units and a register file architecture to support speculative multithreading. For at least one embodiment, live-in values for a speculative thread are computed via execution of a precomputation slice and are stored in a validation buffer for later validation. A global register file holds the committed architecture state generated by a non-speculative thread. Each thread unit includes a local register file. A directory indicates, for each architectural register, which speculative thread(s) have generated a value for the architectural register. Other embodiments are also described and claimed.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The analysis may be practiced via software in a compiler, binary optimizer, standalone modeler, or the like. The analysis may include determining a predicted execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. The method is further to select a spawning pair, according to a greedy approach, if the spawning pair provides a performance enhancement, in terms of decreased execution time due to increased parallelism, when the speculative thread is spawned during execution of a code sequence. Other embodiments are also described and claimed.