摘要:
Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
摘要:
Method, apparatus and system embodiments to schedule user-level OS- independent 'shreds' without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. The scheduler routine may receive compiler-generated hints from a compiler. The compiler hints may be generated by the compiler without user-provided pragmas, and may be passed to the scheduler routine via an API-like interface. The interface may include a scheduling hint data structure that is maintained by the compiler. Other embodiments are also described and claimed.
摘要:
Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.