Abstract:
The present disclosure is directed to a method for providing an OpenMP reduction implementation. The method may comprise creating an aggregate of at least one reduction variable in a parallel region or a work-sharing construct; defining a pointer variable, the pointer variable pointing to a dynamic array of the aggregate; creating an initialization routine, an outlined routine and a reduction accumulation routine; replacing the parallel region or the work-sharing construct with a runtime routine, the runtime routine taking a plurality of arguments including an address of the initialization routine, an address of the outlined routine, an address of the reduction accumulation routine, an address of the pointer variable, and a size of the aggregate; and executing the runtime routine when the at least one reduction variable is in the parallel region or the work-sharing construct.
Abstract:
A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.
Abstract:
A method of compiling source code. The method includes converting pointer-based access in the source code to array-based access in the source code in a first pass compilation of the source code. Information is collected for objects in the source code during the first pass compilation. Candidate objects in the source code are selected based on the collected information to form selected candidate objects. Global stride variables are created for the selected candidate objects. Memory allocation operations are updated for the selected candidate objects in a second pass compilation of the source code. Multiple-level pointer indirect references are replaced in the source code with multi-dimensional array indexed references for the selected candidate objects in the second pass compilation of the source code.
Abstract:
Inter-procedural strength reduction is provided by a mechanism of the present invention to improve data cache performance. During a forward pass, the present invention collects information of global variables and analyzes the usage pattern of global objects to select candidate computations for optimization. During a backward pass, the present invention remaps global objects into smaller size new global objects and generates more cache efficient code by replacing candidate computations with indirect or indexed reference of smaller global objects and inserting store operations to the new global objects for each computation that references the candidate global objects.
Abstract:
An illustrative embodiment of a computer-implemented process for managing aliasing constraints, identifies an object to form an identified object, identifies a scope of the identified object to form an identified scope, and assigns a unique value to the identified object within the identified scope. The computer-implemented process further demarcates an entrance to the identified scope, demarcates an exit to the identified scope, optimizes the identified object using a property of the identified scope and associated aliasing information, tracks the identified object state to form tracked state information; and uses the tracked state information to update the identified object.
Abstract:
An illustrative embodiment of a computer-implemented process for managing aliasing constraints, identifies an object to form an identified object, identifies a scope of the identified object to form an identified scope, and assigns a unique value to the identified object within the identified scope. The computer-implemented process further demarcates an entrance to the identified scope, demarcates an exit to the identified scope, optimizes the identified object using a property of the identified scope and associated aliasing information, tracks the identified object state to form tracked state information; and uses the tracked state information to update the identified object.
Abstract:
Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.
Abstract:
A computer implemented method for facilitating debugging of source code. The source code is scanned to identify a candidate region. A procedure control descriptor is generated, wherein the procedure control descriptor corresponds to the candidate region. The procedure control descriptor identifies, for the candidate region, a condition which, if true at runtime means that the candidate region can be specialized. Responsive to a determination during compile time that satisfaction of at least one condition will be known only at runtime, the procedure control descriptor is used to specialize the candidate region at compile time to create a first version of the candidate region for execution in a case where the condition is true and a second version of the candidate region for execution in a case where the condition is false, and further generate code to correctly select one of the first region and the second region at runtime.
Abstract:
Optimizing program code in a static compiler by determining the live ranges of variables and determining which live ranges are candidates for moving code from the use site to the definition site of source code. Live ranges for variables in a flow graph are determined. Selected live ranges are determined as candidates in which code will be moved from a use site within the source code to a definition site within the source code. Optimization opportunities within the source code are identified based on the code motion.
Abstract:
Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.