摘要:
A method and system are provided for thread management in parallel processes in a multi-core or multi-node system. The method includes receiving monitored hardware metrics information from the multiple cores or multiple nodes on which processes are executed, receiving monitored process and thread information; and globally monitoring the processing across the multiple cores or multiple nodes. The method further includes analyzing the monitored information to minimize imbalances between the multiple cores and/or to improve core or node exploitation and dynamically adjusting the number of threads per process based on the analysis.
摘要:
A method and system are provided for thread management in parallel processes in a multi-core or multi-node system. The method includes receiving monitored hardware metrics information from the multiple cores or multiple nodes on which processes are executed, receiving monitored process and thread information; and globally monitoring the processing across the multiple cores or multiple nodes. The method further includes analyzing the monitored information to minimize imbalances between the multiple cores and/or to improve core or node exploitation and dynamically adjusting the number of threads per process based on the analysis.
摘要:
A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.
摘要:
A method and system of modifying instructions forming a loop is provided. A method of modifying instructions forming a loop includes modifying instructions forming a loop including: determining static and dynamic characteristics for the instructions; selecting a modification factor for the instructions based on a number of separate equivalent sections forming a cache in a processor which is processing the instructions; and modifying the instructions to interleave the instructions in the loop according to the modification factor and the static and dynamic characteristics when the instructions satisfy a modification criteria based on the static and dynamic characteristics.
摘要:
Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.
摘要:
An illustrative embodiment provides a computer-implemented process for managing multiple speculative assist threads for data pre-fetching that sends a command from an assist thread of a first processor to second processor and a memory, wherein parameters of the command specify a processor identifier of the second processor, responsive to receiving the command, reply by the second processor indicating an ability to receive a cache line that is a target of a pre-fetch, responsive to receiving the command replying by the memory indicating a capability to provide the cache line, responsive to receiving replies from the second processor and the memory, sending, by the first processor, a combined response to the second processor and the memory, wherein the combined response indicates an action, and responsive to the action indicating a transaction can continue sending the requested cache line, by the memory, to the second processor into a target cache level on the second processor.
摘要:
The present invention provides a computer implemented method, apparatus, and computer usable program code for compiling instructions to manage a cache system. Loop constructs are analyzed to identify data usage characteristics for cache and prefetching conditions in instructions to form identified prefetch conditions. A set of control instructions are inserted into the instructions based on the data usage characteristics and the identified prefetch conditions to form multiple modified instructions. The set of multiple modified instructions are compiled to generate code for execution to form compiled instructions. The set of control instructions in the compiled instructions form a cache management policy to control movement of data in a memory system during execution of the compiled instructions.
摘要:
May-constant propagation is a technique used to propagate a constant through the call graph and control flow graph by ignoring possible kills and re-definitions with low probability. Variables associated with constants in program code are determined. Execution flow probabilities are executed for code segments of the program code that comprise the variables. The execution flow probabilities are calculated based on flow data for the program code. At least a first of the code segments is determined to have a high execution flow probability. The first of the constants associated with the first variable are propagated through the flow data to generate modified flow data.
摘要:
Systems, methods and computer products for compiler support for aggressive safe load speculation. Exemplary embodiments include a method for aggressive safe load speculation for a compiler in a computer system, the method including building a control flow graph, identifying both countable and non-countable loops, gathering a set of candidate loops for load speculation, and for each candidate loop in the set of candidate loops gathered for load speculation, computing an estimate of the iteration count, delay cycles, and code size, performing a profitability analysis and determining an unroll factor based on the delay cycles and the code size, transforming the loop by generating a prologue loop to achieve data alignment and an unrolled main loop with loop directives, indicating which loads can safely be executed speculatively and performing low-level instruction scheduling on the generated unrolled main loop.
摘要:
A method for implementing shadow versioning to improve data dependence analysis for instruction scheduling in compiling code, and to identify loops within the code to be compiled, for each loop initializing a dependence a matrix, for each loop shadow identifying symbols that are accessed by the loop, examining dependencies, storing, comparing and classifying the dependence vectors, generating new shadow symbols, replacing the old shadow symbols with the new shadow symbols, generating alias relationships between the newly created shadow symbols, scheduling instructions and compiling the code.