摘要:
Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
摘要:
Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
摘要:
Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
摘要:
In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.
摘要:
Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
摘要:
Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread.
摘要:
Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
摘要:
Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.
摘要:
Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
摘要:
The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.