摘要:
In a method for execution control acquisition of a program, during the execution of the program, it is determined when a hardware performance counter has reached a threshold. When the threshold is reached, execution control is switched to a dynamic optimizer. Thereafter, an optimized version of the program is executed. In a method for executing an optimized version of a program, during execution of the optimized version, an interrupt is received and execution control is returned to an operating system. An original version of the program is then executed. During the execution of the original version, a hardware performance counter is monitored. When the hardware performance counter reaches a threshold during the execution of the original version, execution control is switched to a dynamic optimizer. Thereafter, the execution of the optimized version of the program is continued as directed by the dynamic optimizer.
摘要:
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of the parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. A profiling tool is used to collect, analyze, and visualize the performance data of an application in connection with its execution on a parallel-processing computer system through the runtime system. This profiling tool greatly enhances an application developer's ability to understand how an application is executed on the parallel-processing computer system and fine-tune the application to achieve high performance.
摘要:
One embodiment of the present invention provides a system for controlling cache line eviction. The system operates by first receiving a sequence of instructions at a processor during execution of a program, wherein the sequence of instructions causes a cache line to be loaded into the cache. Next, the system examines the sequence of instructions to determine if an associated cache line includes only scratch data that will not be reused. If so, upon loading the cache line into the cache, the system marks the cache line as containing only scratch data, which allows the cache line to be evicted next from the cache.
摘要:
In a method for dynamic allocation of memory address space, an original version of a program is executed. This execution includes the execution of a request to use memory address space occupied by an optimized version of the program that is protected from modification. When this request is detected, execution control is passed to an optimization code that was used to define the optimized program. The optimization code copies a portion of the optimized program residing in the memory address space requested by the original program, writes the copied portion to unallocated memory address space, and adjusts the code of the optimized program. The protection of the copied portion of the optimized program is released, and execution control is returned to the original program. The request to use the memory address space occupied by the portion of the optimized for which the protection has been released is then re-executed.
摘要:
A CPU includes a register file including a plurality of architectural registers for storing data loaded from a primary memory for execution by the CPU. A stack cache memory coupled to the register file includes a plurality of cache lines, each of which corresponds to one of the architectural registers and implements a first-in, last-out queue for data spilled from the corresponding architectural register. Data spilled from the register file into the stack cache memory is maintained in the stack cache until subsequently restored to the register file without accessing primary memory. The stack cache memory does not participate in cache writeback operations to primary memory.