Abstract:
A method for using profiling to obtain application-specific, preferred parameter values for an application is disclosed. First, a parameter for which to obtain an application-specific value is identified. Code is then augmented for application-specific profiling of the parameter. The parameter is profiled and profile data is collected. The profile data is then analyzed to determine the application's preferred parameter value for the profile parameter.
Abstract:
A method for using profiling to obtain application-specific, preferred parameter values for an application is disclosed. First, a parameter for which to obtain an application-specific value is identified. Code is then augmented for application-specific profiling of the parameter. The parameter is profiled and profile data is collected. The profile data is then analyzed to determine the application's preferred parameter value for the profile parameter.
Abstract:
A method for using profiling to obtain application-specific, preferred parameter values for an application is disclosed. First, a parameter for which to obtain an application-specific value is identified. Code is then augmented for application-specific profiling of the parameter. The parameter is profiled and profile data is collected. The profile data is then analyzed to determine the application's preferred parameter value for the profile parameter.
Abstract:
A system and method for improving the performance of all applications are disclosed. Production profile data may be collected about each application while the application is executing. The production profile data may be converted into symbolized profiles and stored in a database. The symbolized profiles may be aggregated into a single aggregated profile. This aggregated profile may be used as a compilation input when compiling new versions of an application's binary to improve the application's performance for observed application behavior.
Abstract:
Provided are methods and systems for inter-procedural optimization (IPO). A new IPO architecture (referred to as “ThinLTO”) is designed to address the weaknesses and limitations of existing IPO approaches, such as traditional Link Time Optimization (LTO) and Lightweight Inter-Procedural Optimization (LIPO), and become a new link-time-optimization standard. With ThinLTO, demand-driven and summary-based fine grain importing maximizes the potential of Cross-Module Optimization (CMO), which enables as much useful CMO as possible ThinLTO also provides for global indexing, which enables fast function importing; parallelizes some performance-critical but expensive inter-procedural analyses and transformations; utilizes demand-driven, lazy importing of debug information that minimizes memory consumption for the debug build; and allows easy integration of third-party distributed build systems. In addition, ThinLTO may also be implemented using an IPO server, thereby removing the need for the serial step.
Abstract:
A method includes generating a first executable program module based on source code modules and collecting profile information for the source code modules by executing the first executable program module. The profile information includes information pertaining to invocation of procedures in the first executable program module. The method further includes determining module grouping information for the source code modules based on procedure invocation patterns in the profile information and according to one or more inter-procedural optimization (IPO) heuristics. The method includes performing IPO based on the module grouping information to generate object code modules and generating a second executable program module based on the plurality of object code modules.
Abstract:
Methods for reducing memory loads for accessing global variables (globals) when creating executables for position independent (PI) code are disclosed. A first method includes compiling PI code, identifying globals, and determining whether globals are defined in the executable. If a global is not defined in the executable, a definition is created in the executable. A second method includes receiving a list of defined globals from instrumented PI code binary and comparing the list with globals in the PI code. Memory loads are created for globals that are unlisted. A third method includes compiling PI code with special relocations for globals and determining whether globals are defined in the executable. If the global is defined in the executable, the special relocation is replaced with a direct load of the global. If not, the special relocation is replaced with a two-instruction sequence that loads the global's address and then the global's value.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving source code that contains a hot function that calls a multiversioned function, where a function definition of the multiversioned function specifies a first version and an alternative second version, and generating compiled code that includes a first and a second clone of the hot function, and a first and a second version of the multiversioned function. In the compiled code, the first clone of the hot function includes a direct call to the first version of the multiversioned function, and the second clone of the hot function includes a direct call to the second version of the multiversioned function.
Abstract:
A system and method for using inline stacks to improve the performance of application binaries is included. While executing a first application binary, profile data may be collected about the application that includes which callee functions are called from the application's callsites and the number of times each inline stack is executed. A context summary map may be created from the collected profile data which shows a summary of the total execution count of all instructions in the callee function for each callsite inlined in the application's normal binary. Using the context summary map, each function callsite's execution count may be compared with a predetermined threshold to determine if the function should be inlined. Then the application's profile may be annotated and a second application binary, an optimized binary, may be generated using the annotated profile.