摘要:
We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.
摘要:
A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.
摘要:
Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.
摘要:
One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.
摘要:
One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.
摘要:
One embodiment of the present invention provides a system that supports parallelized generic reduction operations in a parallel programming language, wherein a reduction operation is an associative operation that can be divided into a group of sub-operations that can execute in parallel. During operation, the system detects generic reduction operations in source code. In doing so, the system identifies a set of reduction variables upon which the generic reduction operation will operate, along with a set of initial values for the variables. The system additionally identifies a merge operation that merges partial results from the parallel generic reduction operations into a final result. The system then compiles the program's source code into a form which facilitates executing the generic reduction operations in parallel. By supporting the parallel execution of such generic reduction operations in this way, the present invention extends parallel execution for reduction operations beyond basic commutative and associative operations such as addition and multiplication.
摘要:
One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.
摘要:
A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.
摘要:
Hemoglobin is site-specifically crosslinked into its tetrameric form by reaction with a trifunctional reagent which combines electrostatic effects, steric effects and the presence of functional groups so that two of the functional groups react with specific sites on the hemoglobin whilst the third site is left free for reaction with endogenous nucleophilic compounds. A specific example of such a crosslinking reagent is trimesoyl tris(3,5-dibromosalicylate), TTDS, which effects specific crosslinking between the amino groups of lysine-82 on each respective .beta. sub-unit. While the crosslinking reagent TTDS has three available carboxyl groups for the crosslinking reaction, only two so react, leaving one free carboxyl for reaction with exogenous nucleophiles, e.g. to render the hemoglobin product useful as a carrier for nucleophilic compounds through the body's circulatory system.
摘要:
A system and method for minimizing register spills during compilation. A compiler reallocates spilled variables from stack memory to other available registers. Although a corresponding register file may not have available registers for storage, the compiler identifies available registers in other locations for storage. The compiler identifies available registers in an alternate register file, wherein the alternate register file may be a floating-point register file which is then used for spilled integer variables. Other instruction type combinations between spilled variables and alternate register files are possible. When an available register within the alternate register file is identified, the compiler modifies the program instructions to allocate the corresponding spilled variable to the available register.