摘要:
Methods, systems, and products for determining performance of a software entity running on a data processing system. The method comprises allowing extended execution of the software entity without monitoring code. The method also comprises intermittently sampling behavior data for the software entity. Intermittently sampling behavior data may be carried out by injecting monitoring code into the software entity to instrument the software entity, collecting behavior data by utilizing the monitoring code, and removing the monitoring code. The method also comprises repeatedly performing iterations of the allowing and sampling steps until collected behavior data is sufficient for diagnosing performance of the software entity. The method may further comprise analyzing the collected behavior data to diagnose performance of the software entity.
摘要:
Methods, systems, and products for determining performance of a software entity running on a data processing system. The method comprises allowing extended execution of the software entity without monitoring code. The method also comprises intermittently sampling behavior data for the software entity. Intermittently sampling behavior data may be carried out by injecting monitoring code into the software entity to instrument the software entity, collecting behavior data by utilizing the monitoring code, and removing the monitoring code. The method also comprises repeatedly performing iterations of the allowing and sampling steps until collected behavior data is sufficient for diagnosing performance of the software entity. The method may further comprise analyzing the collected behavior data to diagnose performance of the software entity.
摘要:
A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time.
摘要:
A method, apparatus, system, and signal-bearing medium that, in an embodiment, retrieve event data from a processor for sampling intervals, where the sampling intervals are evenly distributed, but the control points at which the event data is retrieved are unevenly distributed. The processor executes instructions for logical partitions, and the event data is associated with events that are detected by the processor during the sampling intervals. In response to an interrupt received from the processor at the control point, a determination is made whether the sample point has been reached. If the sample point has been reached, the event data is retrieved from the processor and an event counter is reset to a value that is calculated to cause the processor to include an identical number of the events in the sampling intervals. The value is calculated based on the event counter at the time control point, the event counter at a time of the sample point, and the number of events in the sampling interval. In this way, an even distribution of event data may be collected when the processor is allocated to multiple partitions in a logically-partitioned system.
摘要:
A system, method, and computer program product are disclosed for reducing overhead associated with software lock monitoring in a multiple-processor data processing system having a memory that is shared among the multiple processors. Multiple memory locations in the shared-memory are associated with one of multiple locks. Overhead is reduced by generating a trace hook only in response to activity associated with lock misses.
摘要:
A method and system for optimizing branch prediction in an executable computer program compiled for execution on a pipelined processor that employs branch prediction. The source program is compiled and, in one embodiment, instrumented to collect branch selection statistics. The compiled program is run and statistics collected using the instrumentation or a standard trace program. The branch statistics are used to modify the executable program to cause branch prediction to be correct a majority of the time for the workload against which the program was run. In a computer system having a branch prediction bit, that bit is set or cleared to cause correct branch prediction a majority of the time.
摘要:
A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time.
摘要:
A microprocessor performance monitor and instruction address break point facility are interconnected to provide finer granularity and performance monitoring. The microprocessor is initialized to collect processor statistics preselected prior to performance monitoring. Application start and stop instruction breakpoint addresses are preselected from a software program bounding instructions for which such statistics are desired. An exception handler is installed for instruction address breakpoints (IAB), enabling and disabling the performance monitor and stop addresses, respectively. The IAB register is then initalized to the start address, and the statistics counters are cleared. Upon starting the application, when the application start address instruction is executed, the breakpoint handler obtains control and enables the performance monitor counters, which count the desired statistics after returning from the breakpoint handler. Before returning, the handler sets the IAB register to the stop address. When the application stop address is encountered, the breakpoint handler disables the performance monitor counters, and rearms the start address in the IAB register. The performance monitor counters are then read to determine the desired statistics for the specific sequence of code within the boundaries of the start and stop addresses in the application.
摘要:
A method and system for reducing or avoiding store misses with a data cache block zero (DCBZ) instruction in cooperation with the underlying hardware load stream prefetching support for helping to increase effective aggregate bandwith. The method identifies and classifies unique streams in a loop based on dependency and reuse analysis, and performs loop transformations, such as node splitting, loop distribution or stream unrolling to get the proper number of streams. Static prediction and run-time profile information are used to guide loop and stream selection. Compile-time loop cost analysis and run-time check code and versioning are used to determine the number of cache lines ahead of each reference for data cache line zeroing and to tolerate required data alignment relative to data cache lines.
摘要:
A system and method are provided that allows the results of an instruction trace mechanism to globally restructure the instructions. The process reorders the instructions in an executable program, using an actual execution profile (or instruction address trace) for a selected workload, to improve utilization of the existing hardware architecture. The reordering of instructions is implemented at a global level (i.e., independent of procedure or other structural boundaries which maximizes speedup) running on various hardware platforms and adds the ability to preserve correctness and debuggability for reordered executables. An unconditional branch instruction is added at the memory locations where reordered instructions previously were stored. When a dynamic branch occurs, the program will attempt to access the instruction at the original address and the unconditional branch directs the program to the reordered location of the instruction and program integrity is maintained.