摘要:
Embodiments relate to branch prediction preloading. An aspect includes a system for branch prediction preloading. The system includes an instruction cache and branch target buffer (BTB) coupled to a processing circuit, the processing circuit configured to perform a method. The method includes fetching a plurality of instructions in an instruction stream from the instruction cache, and decoding a branch prediction preload instruction in the instruction stream. An address of a predicted branch instruction is determined based on the branch prediction preload instruction. A predicted target address is determined based on the branch prediction preload instruction. A mask field is identified in the branch prediction preload instruction, and a branch instruction length is determined based on the mask field. Based on executing the branch prediction preload instruction, the BTB is preloaded with the address of the predicted branch instruction, the branch instruction length, the branch type, and the predicted target address.
摘要:
Embodiments relate to branch prediction preloading. An aspect includes a system for branch prediction preloading. The system includes an instruction cache and branch target buffer (BTB) coupled to a processing circuit, the processing circuit configured to perform a method. The method includes fetching a plurality of instructions in an instruction stream from the instruction cache, and decoding a branch prediction preload instruction in the instruction stream. An address of a predicted branch instruction is determined based on the branch prediction preload instruction. A predicted target address is determined based on the branch prediction preload instruction. A mask field is identified in the branch prediction preload instruction, and a branch instruction length is determined based on the mask field. Based on executing the branch prediction preload instruction, the BTB is preloaded with the address of the predicted branch instruction, the branch instruction length, the branch type, and the predicted target address.
摘要:
A computer implemented method, system, and computer usable program code for selective instruction scheduling. A determination is made whether a region of code exceeds a modification threshold after performing register allocation on the region of code. The region of code is marked as a modified region of code in response to the determination that the region of code exceeds the modification threshold. A determination is made whether the region of code exceeds an execution threshold in response to the determination that the region of code is marked as a modified region of code. Post-register allocation instruction scheduling is performed on the region of code in response to the determination that the region of code is marked as a modified region of code and the determination that the region of code exceeds the execution threshold.
摘要:
Systems, methods, and computer products for evaluating robustness of a list scheduling framework. Exemplary embodiments include a method for evaluating the robustness of a list scheduling framework, the method including identifying a set of compiler benchmarks known to be sensitive to an instruction scheduler, running the set of benchmarks against a heuristic under test, H and collect an execution time Exec(H[G]), where G is a directed a-cyclical graph, running the set of benchmarks against a plurality of random heuristics Hrand[G]i, and collect a plurality of respective execution times Exec(Hrand[G])i, computing a robustness of the list scheduling framework, and checking robustness check it against a pre-determined threshold.
摘要:
There is disclosed a method and system for configuring a data dependency graph (DDG) to handle instruction scheduling in computer architectures permitting dynamic by-pass execution, and for performing dynamic by-pass scheduling utilizing such a configured DDG. In accordance with an embodiment of the invention, a heuristic function is used to obtain a ranking of nodes in the DDG after setting delays at all identified by-pass pairs of nodes in the DDG to 0. From among a list of identified by-pass pairs of nodes, a node that is identified as being the least important to schedule early is marked as “bonded” to its successor, and the corresponding delay for that identified node is set to 0. Node rankings are re-computed and the bonded by-pass pair of nodes are scheduled in consecutive execution cycles with a delay of 0 to increase the likelihood that a by-pass can be successfully taken during run-time execution.
摘要:
Systems, methods, and computer products for evaluating robustness of a list scheduling framework. Exemplary embodiments include a method for evaluating the robustness of a list scheduling framework, the method including identifying a set of compiler benchmarks known to be sensitive to an instruction scheduler, running the set of benchmarks against a heuristic under test, H and collect an execution time Exec(H[G]), where G is a directed a-cyclical graph, running the set of benchmarks against a plurality of random heuristics Hrand[G]i, and collect a plurality of respective execution times Exec(Hrand[G])i, computing a robustness of the list scheduling framework, and checking robustness check it against a pre-determined threshold.
摘要:
Assigning each of a plurality of memory fetch units to any of a plurality of candidate variables to reduce load-hit-store delays, wherein a total number of required memory fetch units is minimized. A plurality of store/load pairs are identified. A dependency graph is generated by creating a node Nx for each store to variable X and a node Ny for each load of variable Y and, unless X=Y, for each store/load pair, creating an edge between a respective node Nx and a corresponding node Ny; for each created edge, labeling the edge with a heuristic weight; labeling each node Nx with a node weight Wx that combines a plurality of respective edge weights of a plurality of corresponding nodes Nx such that Wx=Σωxj; and determining a color for each of the graph nodes using k distinct colors wherein k is minimized such that no adjacent nodes joined by an edge between a respective node Nx and a corresponding node Ny have an identical color; and assigning a memory fetch unit to each of the k distinct colors.
摘要:
There is disclosed a method and system for configuring a data dependency graph (DDG) to handle instruction scheduling in computer architectures permitting dynamic by-pass execution, and for performing dynamic by-pass scheduling utilizing such a configured DDG. In accordance with an embodiment of the invention, a heuristic function is used to obtain a ranking of nodes in the DDG after setting delays at all identified by-pass pairs of nodes in the DDG to 0. From among a list of identified by-pass pairs of nodes, a node that is identified as being the least important to schedule early is marked as “bonded” to its successor, and the corresponding delay for that identified node is set to 0. Node rankings are re-computed and the bonded by-pass pair of nodes are scheduled in consecutive execution cycles with a delay of 0 to increase the likelihood that a by-pass can be successfully taken during run-time execution.
摘要:
There is disclosed a method and system for configuring a data dependency graph (DDG) to handle instruction scheduling in computer architectures permitting dynamic by-pass execution, and for performing dynamic by-pass scheduling utilizing such a configured DDG. In accordance with an embodiment of the invention, a heuristic function is used to obtain a ranking of nodes in the DDG after setting delays at all identified by-pass pairs of nodes in the DDG to 0. From among a list of identified by-pass pairs of nodes, a node that is identified as being the least important to schedule early is marked as “bonded” to its successor, and the corresponding delay for that identified node is set to 0. Node rankings are re-computed and the bonded by-pass pair of nodes are scheduled in consecutive execution cycles with a delay of 0 to increase the likelihood that a by-pass can be successfully taken during run-time execution.
摘要:
Automatic inspection of compiled code. In response to revising a compiler, the functionality of that compiler is verified. Specific code is compiled using a first version of the compiler, as well as a second version of the compiler. Each compiled code is then applied to machine state to obtain multiple machine states. The machine states are then compared to determine if they are equal.