摘要:
Disclosed are embodiments of a method and system for calculating an unrolling factor for software loops. The unrolling factor may be calculated by applying a formula that takes into account issue constraints of a processor. The issue constraints may include the total issue width of the processor, and may also include individual issue constraints for each instruction type. The software loop may be unrolled by the calculated unrolling factor and may be software pipelined. Other embodiments are also described and claimed.
摘要:
Disclosed are embodiments of a compiler, methods, and system for resource-aware scheduling of instructions. A list scheduling approach is augmented to take into account resource constraints when determining priority for scheduling of instructions. Other embodiments are also described and claimed.
摘要:
Disclosed are embodiments of a compiler, methods, and system for resource-aware scheduling of instructions. A list scheduling approach is augmented to take into account resource constraints when determining priority for scheduling of instructions. Other embodiments are also described and claimed.
摘要:
In an embodiment, a processor includes a plurality of cores to independently execute instructions, a shared cache coupled to the cores and including a plurality of lines to store data, and a power controller including a low power control logic to calculate a flush latency to flush the shared cache based on a state of the plurality of lines. Other embodiments are described and claimed.
摘要:
An efficient method for software-pipelining (SWP) of loops to translate programs, from higher level languages into equivalent object or machine language code for execution on a computer. In one example embodiment, this is accomplished by spilling and filling multiple computed values, in a register, that are live across multiple stages in a software-pipelined loop, using multiple rotating stack memory locations to reduce compiler-time of SWP, and complexity of the implemented SWP.
摘要:
A computing platform may include components to determine performance loss values and energy savings values for each of the plurality of regions and/or the memory boundedness value of each of a plurality of regions within an application. The computing platform may provide a user interface for a user to provide a user input, which provides an indication of an acceptable performance loss. For the provided performance loss value, the frequency values may be determined and the processing element may be operated at the frequency values while processing each of the plurality of regions.
摘要:
A method of parallel execution of a first and a second instruction in an in-order processor. Embodiments of the invention enable parallel execution of memory instructions that are stalled by cache memory misses. The in-order processor processes cache memory misses of instructions in parallel by overlapping the first cache memory miss with cache memory misses that occur after the first cache memory miss. Memory-level parallelism in the in-order processor can be increased when more parallel and outstanding cache memory misses are generated.
摘要:
A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.
摘要:
Code restructuring or reordering based on profiling information and memory hierarchy is provided by constructing a Program Execution Graph (PEG) corresponding to a level of the memory hierarchy, partitioning this PEG to reduce estimated memory overhead costs below an upper bound, and constructing a PEG for a next level of the memory hierarchy from the partitioned PEG. The PEG is constructed from control flow and frequency information from a profile of the program to be restructured. The PEG is a weighted undirected graph comprising nodes representing basic blocks and edges representing transfer of control between pairs of basic blocks. The weight of a node is the size of the basic block it represents and the weight of an edge is the frequency of transition between the pair of basic blocs it connects.
摘要:
The invention is directed to the transformation of software loops having early exit conditions, thereby allowing the loops to be more effectively converted to a single basic block for software pipelining. The invention assigns a predicate register for each early exit condition of the software loop. The predicate registers are set when the corresponding early exit condition is satisfied. In this manner, when the loop terminates the predicate registers can be examined to indicate which early exit conditions were satisfied. The invention produces loops having a lower recurrence II and resource II than conventional techniques.