Hardware assist thread for dynamic performance profiling
    11.
    发明授权
    Hardware assist thread for dynamic performance profiling 失效
    用于动态性能分析的硬件辅助线

    公开(公告)号:US08612730B2

    公开(公告)日:2013-12-17

    申请号:US12796124

    申请日:2010-06-08

    IPC分类号: G06F9/00

    摘要: A method and data processing system for managing running of instructions in a program. A processor of the data processing system receives a monitoring instruction of a monitoring unit. The processor determines if at least one secondary thread of a set of secondary threads is available for use as an assist thread. The processor selects the at least one secondary thread from the set of secondary threads to become the assist thread in response to a determination that the at least one secondary thread of the set of secondary threads is available for use as an assist thread. The processor changes profiling of running of instructions in the program from the main thread to the assist thread.

    摘要翻译: 一种用于管理程序中的指令的运行的方法和数据处理系统。 数据处理系统的处理器接收监视单元的监视指令。 处理器确定一组辅助线程的至少一个辅助线程是否可用作辅助线程。 响应于确定所述一组次要线程的至少一个辅助线程可用作辅助线程,所述处理器从所述辅助线程组中选择所述至少一个辅助线程以成为所述辅助线程。 处理器将程序中指令的运行情况从主线程更改为辅助线程。

    Autonomic Hotspot Profiling Using Paired Performance Sampling
    12.
    发明申请
    Autonomic Hotspot Profiling Using Paired Performance Sampling 失效
    使用配对性能采样的自动热点分析

    公开(公告)号:US20120124560A1

    公开(公告)日:2012-05-17

    申请号:US12946959

    申请日:2010-11-16

    IPC分类号: G06F9/44

    摘要: A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.

    摘要翻译: 处理器性能分析器能够用于识别由微处理器通过随机采样来执行的程序中导致性能问题的特定指令,以找到诸如高速缓存未命中或分支误预测的特定事件类型的最坏情况的违规者。 跟踪导致特定事件的所有指令会生成大量数据日志,创建性能损失,并使代码分析更加困难。 然而,通过识别和跟踪随机事件样本中的最坏罪犯,而不必对所有事件进行散列,从而导致性能分析器的较小内存需求,降低性能影响,同时分析并降低分析程序以识别主要性能问题的复杂性, 这反过来,可以在较短的开发人员时间内更好地优化程序。

    System and method for distributing signal with efficiency over microprocessor
    13.
    发明授权
    System and method for distributing signal with efficiency over microprocessor 失效
    通过微处理器分配信号的系统和方法

    公开(公告)号:US08055809B2

    公开(公告)日:2011-11-08

    申请号:US12343594

    申请日:2008-12-24

    IPC分类号: G06F3/00

    摘要: A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.

    摘要翻译: 一种用于通过微处理器分配信号的系统和相关方法。 性能监视单元(PMU)将配置信号发送到单元以监视本机发生的事件。 该单元连接到配置总线和从PMU菊花链到微处理器中的其他单元的事件总线。 配置总线将配置信号从PMU发送到单元以设置单元以报告事件。 该单元通过事件总线向PMU发送事件信号。 该单元被配置为接收到包括该单元的总线斜坡的基地址的配置信号。 通过调整配置信号中的位域的长度,可灵活选择多个单元和多个监控事件。

    Concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers
    14.
    发明授权
    Concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers 失效
    在跟踪过程和使用可编程可变数量的共享内存写入缓冲区的非跟踪过程之间同时共享内存控制器

    公开(公告)号:US07913123B2

    公开(公告)日:2011-03-22

    申请号:US12210005

    申请日:2008-09-12

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2268 G06F11/348

    摘要: An apparatus and computer program product are disclosed for, in a processor, concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers. A hardware trace facility captures hardware trace data in a processor. The hardware trace facility is included within the processor. The hardware trace data is transmitted to a system memory utilizing a system bus. The system memory is included within the system. The system bus is capable of being utilized by processing units included in the processing node while the hardware trace data is being transmitted to the system bus. Part of system memory is utilized to store the trace data. The system memory is capable of being accessed by processing units in the processing node other than the hardware trace facility while part of the system memory is being utilized to store the trace data.

    摘要翻译: 公开了一种装置和计算机程序产品,用于在处理器中使用可编程可变数量的共享存储器写缓冲器在跟踪处理和非跟踪处理之间共享存储器控制器。 硬件跟踪设备捕获处理器中的硬件跟踪数据。 硬件跟踪工具包含在处理器内。 使用系统总线将硬件跟踪数据传输到系统存储器。 系统内存包含在系统中。 当将硬件跟踪数据发送到系统总线时,系统总线能够被包括在处理节点中的处理单元利用。 系统内存的一部分用于存储跟踪数据。 系统存储器能够被处理节点除硬件跟踪设备之外的处理单元访问,同时系统存储器的一部分用于存储跟踪数据。

    Counting latencies of an instruction table flush, refill and instruction execution using a plurality of assigned counters
    15.
    发明授权
    Counting latencies of an instruction table flush, refill and instruction execution using a plurality of assigned counters 失效
    使用多个分配的计数器计数指令表的等待时间,刷新,补充和指令执行

    公开(公告)号:US06970999B2

    公开(公告)日:2005-11-29

    申请号:US10210415

    申请日:2002-07-31

    IPC分类号: G06F9/38 G06F9/44 G06F15/00

    摘要: A method and system for analyzing cycles per instruction (CPI) performance in a processor. A completion table corresponds to the instructions in a group to be processed by the processor. An empty completion table indicates that there has been some type of catastrophe that caused a table flush. While the table is empty, a performance monitoring counter (PMC), located in a performance monitoring unit (PMU) in the processor, counts the number of clock cycles that the table is empty. Preferably, a separate PMC is utilized depending on the reason that the completion table is empty. A second PMC likewise counts the number of clock cycles spent re-filling the empty completion table. A third PMC counts the number of clock cycles spent actually executing the instructions in the completion table. The information in the PMC's can be used to evaluate the true cause for degradation of CPI performance.

    摘要翻译: 一种用于分析处理器中每条指令(CPI)性能的循环的方法和系统。 完成表对应于要由处理器处理的组中的指令。 一个空的完成表表明有一些类型的灾难导致表冲洗。 当表为空时,位于处理器中的性能监视单元(PMU)中的性能监视计数器(PMC)会计数表为空的时钟周期数。 优选地,根据完成表为空的原因,使用单独的PMC。 第二个PMC同样计算重新填充空完成表的时钟周期数。 第三个PMC计算在完成表中实际执行指令花费的时钟周期数。 PMC中的信息可用于评估CPI性能下降的真正原因。

    Method and apparatus for instruction completion stall identification in an information handling system
    16.
    发明授权
    Method and apparatus for instruction completion stall identification in an information handling system 有权
    信息处理系统中指令完成失速识别的方法和装置

    公开(公告)号:US08832416B2

    公开(公告)日:2014-09-09

    申请号:US11753005

    申请日:2007-05-24

    IPC分类号: G06F11/34

    摘要: An information handling system includes a processor that executes multiple instructions or instruction threads within a software application program. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In one embodiment, the operating system manages instruction completion stall analysis software to determine the cause or causes of instruction stalls. In another embodiment, the stall analysis software cooperates with the operating system software to store instruction completion stall event data on a per instruction basis while the application program executes. The operating system software may cooperate with the stall analysis software to store instruction completion stall data in memory for later manipulation by system users or other software.

    摘要翻译: 信息处理系统包括在软件应用程序内执行多个指令或指令线程的处理器。 信息处理系统包括在多任务环境中管理处理器系统硬件和软件的操作系统软件。 在一个实施例中,操作系统管理指令完成失速分析软件以确定指令停顿的原因或原因。 在另一个实施例中,失速分析软件与操作系统软件配合,以在应用程序执行时以每个指令为基础存储指令完成失速事件数据。 操作系统软件可以与失速分析软件配合以将指令完成失速数据存储在存储器中以供系统用户或其他软件稍后操作。

    Autonomic hotspot profiling using paired performance sampling
    17.
    发明授权
    Autonomic hotspot profiling using paired performance sampling 失效
    使用配对性能采样的自动热点分析

    公开(公告)号:US08615742B2

    公开(公告)日:2013-12-24

    申请号:US12946959

    申请日:2010-11-16

    IPC分类号: G06F9/44 G06F9/45

    摘要: A processor performance profiler is enabled to for identify specific instructions causing performance issues within a program being executed by a microprocessor through random sampling to find the worst-case offenders of a particular event type such as a cache miss or a branch mis-prediction. Tracking all instructions causing a particular event generates large data logs, creates performance penalties, and makes code analysis more difficult. However, by identifying and tracking the worst offenders within a random sample of events without having to hash all events results in smaller memory requirements for the performance profiler, lower performance impact while profiling, and decreased complexity to analyze the program to identify major performance issues, which, in turn, enables better optimization of the program in shorter developer time.

    摘要翻译: 处理器性能分析器能够用于识别由微处理器通过随机采样来执行的程序中导致性能问题的特定指令,以找到诸如高速缓存未命中或分支误预测的特定事件类型的最坏情况的违规者。 跟踪导致特定事件的所有指令会生成大量数据日志,创建性能损失,并使代码分析更加困难。 然而,通过识别和跟踪随机事件样本中的最坏罪犯,而不必对所有事件进行散列,从而导致性能分析器的较小内存需求,降低性能影响,同时分析并降低分析程序以识别主要性能问题的复杂性, 这反过来,可以在较短的开发人员时间内更好地优化程序。

    SYSTEM AND METHOD FOR DISTRIBUTING SIGNAL WITH EFFICIENCY OVER MICROPROCESSOR
    19.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTING SIGNAL WITH EFFICIENCY OVER MICROPROCESSOR 失效
    通过微处理器分发信号的效率的系统和方法

    公开(公告)号:US20100161867A1

    公开(公告)日:2010-06-24

    申请号:US12343594

    申请日:2008-12-24

    IPC分类号: G06F13/36

    摘要: A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.

    摘要翻译: 一种用于通过微处理器分配信号的系统和相关方法。 性能监控单元(PMU)将配置信号发送到单元以监视本机发生的事件。 该单元连接到配置总线和从PMU菊花链到微处理器中的其他单元的事件总线。 配置总线将配置信号从PMU发送到单元以设置单元以报告事件。 该单元通过事件总线向PMU发送事件信号。 该单元被配置为接收到包括该单元的总线斜坡的基地址的配置信号。 通过调整配置信号中的位域的长度,可灵活选择多个单元和多个监控事件。

    Quantifying Completion Stalls Using Instruction Sampling
    20.
    发明申请
    Quantifying Completion Stalls Using Instruction Sampling 失效
    使用指令采样量化完成失速

    公开(公告)号:US20090259830A1

    公开(公告)日:2009-10-15

    申请号:US12099944

    申请日:2008-04-09

    IPC分类号: G06F9/30

    摘要: A method, computer program product, and data processing system for collecting metrics regarding completion stalls in an out-of-order superscalar processor with branch prediction is disclosed. A preferred embodiment of the present invention selectively samples particular instructions (or classes of instructions). Each selected instruction, as it passes through the processor datapath, is marked (tagged) for monitoring by a performance monitoring unit. The progress of marked instructions is monitored by the performance monitoring unit, and various stall counters are triggered by the progress of the marked instructions and the instruction groups they form a part of. The stall counters count cycles to give an indication of when certain delays associated with particular instructions occur and how serious the delays are.

    摘要翻译: 公开了一种用于在具有分支预测的无序超标量处理器中收集关于完成停顿的度量的方法,计算机程序产品和数据处理系统。 本发明的优选实施例有选择地对特定指令(或指令类别)进行采样。 每个选定的指令在通过处理器数据路径时被标记(标记),用于由性能监视单元监视。 标记指令的进度由性能监控单元进行监控,各种失速计数器由标记指令和指令组的进度触发。 停顿计数器计数周期,以指示何时发生与特定指令相关的某些延迟以及延迟的严重程度。