摘要:
A method and data processing system for managing running of instructions in a program. A processor of the data processing system receives a monitoring instruction of a monitoring unit. The processor determines if at least one secondary thread of a set of secondary threads is available for use as an assist thread. The processor selects the at least one secondary thread from the set of secondary threads to become the assist thread in response to a determination that the at least one secondary thread of the set of secondary threads is available for use as an assist thread. The processor changes profiling of running of instructions in the program from the main thread to the assist thread.
摘要:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
摘要:
A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.
摘要:
An apparatus and computer program product are disclosed for, in a processor, concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers. A hardware trace facility captures hardware trace data in a processor. The hardware trace facility is included within the processor. The hardware trace data is transmitted to a system memory utilizing a system bus. The system memory is included within the system. The system bus is capable of being utilized by processing units included in the processing node while the hardware trace data is being transmitted to the system bus. Part of system memory is utilized to store the trace data. The system memory is capable of being accessed by processing units in the processing node other than the hardware trace facility while part of the system memory is being utilized to store the trace data.
摘要:
A method and system for analyzing cycles per instruction (CPI) performance in a processor. A completion table corresponds to the instructions in a group to be processed by the processor. An empty completion table indicates that there has been some type of catastrophe that caused a table flush. While the table is empty, a performance monitoring counter (PMC), located in a performance monitoring unit (PMU) in the processor, counts the number of clock cycles that the table is empty. Preferably, a separate PMC is utilized depending on the reason that the completion table is empty. A second PMC likewise counts the number of clock cycles spent re-filling the empty completion table. A third PMC counts the number of clock cycles spent actually executing the instructions in the completion table. The information in the PMC's can be used to evaluate the true cause for degradation of CPI performance.
摘要:
An information handling system includes a processor that executes multiple instructions or instruction threads within a software application program. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In one embodiment, the operating system manages instruction completion stall analysis software to determine the cause or causes of instruction stalls. In another embodiment, the stall analysis software cooperates with the operating system software to store instruction completion stall event data on a per instruction basis while the application program executes. The operating system software may cooperate with the stall analysis software to store instruction completion stall data in memory for later manipulation by system users or other software.
摘要:
A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.
摘要:
An information handling system includes a memory, a processor, and an instruction tracking unit. The processor executes program code and, while the program code executes, the instruction tracking unit decodes a multi-purpose no-op instruction within the program code. In turn, the instruction tracking unit sends an interrupt to the processor, which invokes a profiling module to collect and store profiling data in a profiling buffer.
摘要:
A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.
摘要:
A method, computer program product, and data processing system for collecting metrics regarding completion stalls in an out-of-order superscalar processor with branch prediction is disclosed. A preferred embodiment of the present invention selectively samples particular instructions (or classes of instructions). Each selected instruction, as it passes through the processor datapath, is marked (tagged) for monitoring by a performance monitoring unit. The progress of marked instructions is monitored by the performance monitoring unit, and various stall counters are triggered by the progress of the marked instructions and the instruction groups they form a part of. The stall counters count cycles to give an indication of when certain delays associated with particular instructions occur and how serious the delays are.