摘要:
A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
摘要:
A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
摘要:
A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.
摘要:
A technique used during interprocedural compilation in which program objects are grouped together based on the weights of the connections between the objects and their costs. System-imposed constraints on memory size can be taken into account to avoid creating groupings that overload the system's capacity. The groupings can be distributed over memories located on different processors.
摘要:
An interprocedural compilation method for aggregating global data variables in external storage to maximize data locality. Using the information displayed in a weighted interference graph in which node weights represent the size of data stored in each global variable and edges between variables represent access relationships between the globals, the global variables can be mapped into aggregates based on this frequency of access, while preventing the cumulative data size in any aggregate from exceeding a memory size restriction.
摘要:
A method for partitioning programs into multi-procedure modules for efficient compilation. During interprocedural analysis, a weighted callgraph of the program is constructed in which weights on nodes represent code size of each procedure and weights on edges between the nodes represent execution counts between procedures. A coloured interference graph is built from the analysis information, and is used to induce weighted sub-graphs of the callgraph containing no interferences between procedures in each sub-graph. The procedures from a single sub-graph are combined into one or more modules; procedures with the highest weighted edges between them are combined in a module first until the cumulative node weight of the module reaches a preset limit on memory size.
摘要:
An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.
摘要:
An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.
摘要:
An improved scheduling technique for software pipelining is disclosed which is designed to find schedules requiring fewer processor clock cycles and reduce register pressure hot spots when scheduling multiple groups of instructions (e.g. as represented by multiple sub-graphs of a DDG) which are independent, and substantially identical. The improvement in instruction scheduling and reduction of hot spots is achieved by evenly distributing such groups of instructions around the schedule for a given loop.
摘要:
A technique of ordering machine instructions to reduce spill code. For each machine instruction that is ready for scheduling, an amount is determined by which the size of a committed set of machine instructions would increase upon the scheduling of the machine instruction. The machine instruction for which the determined amount is smallest is then scheduled. The currently committed instructions may be determined to be the machine instructions that are already scheduled as well as the machine instructions that are descendent from already scheduled machine instructions. The result is that new computations upon which a target processor will embark tend to be deferred. Bit vectors may be employed for efficiency during the assessment of candidate instructions that are ready for scheduling. The technique may be triggered when the risk of registers becoming overcommitted becomes high, as may occur when the number of available processor registers drops below a certain threshold.