摘要:
Disclosed are selected embodiments of a processor that may include a plurality of thread units and a register file architecture to support speculative multithreading. For at least one embodiment, live-in values for a speculative thread are computed via execution of a precomputation slice and are stored in a validation buffer for later validation. A global register file holds the committed architecture state generated by a non-speculative thread. Each thread unit includes a local register file. A directory indicates, for each architectural register, which speculative thread(s) have generated a value for the architectural register. Other embodiments are also described and claimed.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The analysis may be practiced via software in a compiler, binary optimizer, standalone modeler, or the like. The analysis may include determining a predicted execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. The method is further to select a spawning pair, according to a greedy approach, if the spawning pair provides a performance enhancement, in terms of decreased execution time due to increased parallelism, when the speculative thread is spawned during execution of a code sequence. Other embodiments are also described and claimed.
摘要:
Disclosed are selected embodiments of a processor that may include a plurality of thread units and a register file architecture to support speculative multithreading. For at least one embodiment, live-in values for a speculative thread are computed via execution of a precomputation slice and are stored in a validation buffer for later validation. A global register file holds the committed architecture state generated by a non-speculative thread. Each thread unit includes a local register file. A directory indicates, for each architectural register, which speculative thread(s) have generated a value for the architectural register. Other embodiments are also described and claimed.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The method, which may be practiced via software in a compiler or standalone modeler, determines execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. Other embodiments are also described and claimed.
摘要:
The invention relates to a multiversion storage configuration which can store multiple values per speculative set of instructions for one storage position in order to enable the real-time precalculation and execution of the body of the set of instructions from a speculative instruction set. The invention also relates to the validation of the input values which can be calculated and used in the execution of the speculative instruction set. The invention further relates to a method of performing said validation step.
摘要:
A method for analyzing a set of spawning pairs, where each spawning pair identifies at least one speculative thread. The analysis may be practiced via software in a compiler, binary optimizer, standalone modeler, or the like. The analysis may include determining a predicted execution time for a sequence of program instructions, given the set of spawning pairs, for a target processor having a known number of thread units, where the target processor supports speculative multithreading. The method is further to select a spawning pair, according to a greedy approach, if the spawning pair provides a performance enhancement, in terms of decreased execution time due to increased parallelism, when the speculative thread is spawned during execution of a code sequence. Other embodiments are also described and claimed.
摘要:
According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.
摘要:
In one embodiment, the present invention includes a method for receiving a bus message in a first cache corresponding to a speculative access to a portion of a second cache by a second thread, and dynamically determining in the first cache if an inter-thread dependency exists between the second thread and a first thread associated with the first cache with respect to the portion. Other embodiments are described and claimed.
摘要:
Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
摘要:
According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation.