摘要:
Mechanisms are provided for optimizing irregular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which irregular memory references access a storage of a software cache of a data processing system through a transactional cache mechanism of the software cache.
摘要:
A computer implemented method is provided for using SLP in processing a plurality of statements, wherein the statements are associated with an array having a number of array positions, and each statement includes one or more expressions. The method includes the step of gathering expressions for each of the statements into a structure comprising a single merge stream, the merge streams being furnished with a location for each expression, wherein the location for a given expression is associated with one of the array positions. The method further comprises selectively identifying a plurality of expressions, and applying SLP packing operations to the identified expressions, in order to merge respective identified expressions into one or more isomorphic sub-streams. The method further comprises selectively combining the expressions of the isomorphic sub-streams, and other expressions of the single merge stream, into a number of input vectors that are substantially equal in length to one another. A location vector is generated that contains the respective locations for all of the expressions in the single merge stream. The method further comprises generating an output stream that comprises the expressions of the input vectors, wherein the expressions are arranged in the output stream an order determined by the respective locations contained in the location vector.
摘要:
A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.
摘要:
A method to efficiently pre-fetch and batch compiler-assisted software cache accesses is provided. The method reduces the overhead associated with software cache directory accesses. With the method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.
摘要:
A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.
摘要:
Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor, Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.
摘要:
Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor. Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.
摘要:
Mechanisms for building approximate data dependences using a moving look-back window are provided. The mechanisms track dependence information for memory accesses over iterations of execution of a portion of code. The mechanisms receive a memory access of an iteration of the portion of code, the memory access having an address for access the memory and an access type indicating at least one of a read or a write access type. An entry in a moving look-back window data structure is generated corresponding to a memory location accessed by the memory access. The entry comprises at least an identification of the address, the access type, and an iteration number corresponding to the iteration of the memory access. The moving look-back window data structure is utilized to determine dependence information for memory accesses over a plurality of iterations of the portion of code.
摘要:
An example embodiment disclosed is a method for reducing parallelism of computer source code. The method includes receiving multi-threaded program source code and representing the multi-threaded program source code as a polyhedral framework stored in computer readable memory. The polyhedral framework is used to convert the polyhedral framework from the multi-threaded program source code representation to a single-threaded program source code representation.
摘要:
Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.