摘要:
A technique for statically linking an application process to a wrapper library employed in intercepting one or more calls invoked by the application process. The intercepted calls may comprise system calls or library calls. In a first link step, the application process is statically linked with at least the intercept library, and in one embodiment, all libraries associated with the application process except for the wrapper library. This first statically linking step creates a first module. Thereafter, at least one call invoked by the application process, and to be intercepted by the intercept library, is renamed. The renaming of the intercepted call is from its original name to a temporary name in the standard program library, the intercept library and the application program. This renaming step creates a second module that no longer contains the original name of the at least one intercepted call. The second linking step is to statically link the second module with the wrapper library, thereby creating an executable module wherein the application process is statically linked to all libraries while still providing for interception of the at least one system or library call within the executable.
摘要:
A method for effecting a direct jump in an executable program module to a target address displaced from a source address by a specified distance that is greater than a maximum permitted range. During program linkage the direct jump is split into at least two component direct jumps each no greater than the maximum permitted range, thus allowing the direct jump to be achieved by jumping sequentially from the source address to the target address via each of the component direct jumps. A storage medium for storing data representative of the executable program module contains at least one trampoline for performing the component direct jumps.
摘要:
In a method for dynamic allocation of memory address space, an original version of a program is executed. This execution includes the execution of a request to use memory address space occupied by an optimized version of the program that is protected from modification. When this request is detected, execution control is passed to an optimization code that was used to define the optimized program. The optimization code copies a portion of the optimized program residing in the memory address space requested by the original program, writes the copied portion to unallocated memory address space, and adjusts the code of the optimized program. The protection of the copied portion of the optimized program is released, and execution control is returned to the original program. The request to use the memory address space occupied by the portion of the optimized for which the protection has been released is then re-executed.
摘要:
One embodiment of the present invention provides a system that facilitates prefetching memory pages for a computer program. The system operates by analyzing the computer program within a compiler to identify memory pages accessed within a portion of the computer program. Next, the system creates a map of these memory pages accessed by the computer program, wherein the map is indexed by a program counter for the computer program. A given program counter value indexes memory pages within this map that are likely to be accessed during subsequent execution of the computer program. The system examines the map during execution of the computer program, and if the current program counter for the computer program indexes memory pages in the map, the system touches the memory pages, thereby causing the system to prefetch the memory pages.
摘要:
One embodiment of the present invention provides a system that facilitates speculative execution of instructions within a computer system. Upon encountering a stall during execution of an instruction stream, the system synchronizes a cache containing data that is being operated on by the instruction stream. Next, the system configures the cache so that the cache operates as before except that changes to cache lines are not propagated to lower levels of the memory system. The system then speculatively executes a subsequent portion of the instruction stream without waiting for the event that caused the stall to be resolved. In this way, the speculative execution can only change data within the cache, and these changes are not propagated to lower levels of the memory system unless a subsequent commit operation takes place.
摘要:
A method is provided for safely editing a binary code to be executed on a computer system. The method allows the binary code to be directly edited without compromising its integrity. More specifically, a larger binary code is transformed into a number of smaller binary code segments having sizes within a reference range of a control transfer function such as a branch instruction. A branch slamming operation can then used to displace a binary instruction contained within a smaller binary code segment with a branch instruction referring to a binary patch that is appended to the smaller binary code segment. The binary instruction displaced by the branch instruction is preserved in the binary patch. Upon completion of the binary patch execution, the smaller binary code segment continues executing with a binary instruction immediately following the branch instruction. The method for safely editing the binary code is particularly useful with large binary codes having sizes greater than the reference range of the control transfer function.
摘要:
A processor includes a tagging buffer for storing information that advises the processor of potential memory collisions caused by program instruction pairs that refer to the same memory address. In one method for avoiding memory collisions, a program having tagging code identifying program instruction pairs of the program that refer to a same memory address is compiled. The program instruction pairs in the compiled program code are processed while verifying an order in which the program instruction pairs are to be executed using the compiled tagging code, which is loaded into a tagging buffer. In another method, a program that does not include tagging code is compiled. When a trap occurs in the processing of a program instruction pair, program counters that cause the instructions to be executed in a desired order are added to a tagging buffer. A computer system including the processor also is described.
摘要:
In a method for execution control acquisition of a program, during the execution of the program, it is determined when a hardware performance counter has reached a threshold. When the threshold is reached, execution control is switched to a dynamic optimizer. Thereafter, an optimized version of the program is executed. In a method for executing an optimized version of a program, during execution of the optimized version, an interrupt is received and execution control is returned to an operating system. An original version of the program is then executed. During the execution of the original version, a hardware performance counter is monitored. When the hardware performance counter reaches a threshold during the execution of the original version, execution control is switched to a dynamic optimizer. Thereafter, the execution of the optimized version of the program is continued as directed by the dynamic optimizer.
摘要:
In a method for dynamic recompilation of a program, binary code for a program is identified, a portion of the binary code is obtained, and the obtained portion of the binary code is executed while being optimized for, e.g., use with a new hardware architecture. During execution, dynamic changes in flow are identified to enable additional portions of the binary code to be obtained and executed. The executed and optimized portion of the binary code and any additional portions of the binary code are saved to an optimized binary code file for the program. The obtaining and executing of portions of the binary code is continued until all portions of the binary code have been saved to the optimized binary code file for the program. Thereafter, when the program is called, the optimized binary code file for the program can be executed.
摘要:
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of the parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. A profiling tool is used to collect, analyze, and visualize the performance data of an application in connection with its execution on a parallel-processing computer system through the runtime system. This profiling tool greatly enhances an application developer's ability to understand how an application is executed on the parallel-processing computer system and fine-tune the application to achieve high performance.