摘要:
The present disclosure relates to acquiring and releasing a shared resource via a lock semaphore and, more particularly, to acquiring and releasing a shared resource via a lock semaphore utilizing a state machine.
摘要:
Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
摘要:
A loop can be executed on a parallel processor by partitioning the loop iterations into chunks of decreasing size. An increase in speed can be realized by reducing the time taken by a thread when determining the next set of iterations to be assigned to a thread. The next set of iterations can be determined from a chunk index stored in a shared variable. Using a shared variable enables threads to perform operations concurrently to reduce the wait time to the period while another thread increments the shared variable.