Quantifying completion stalls using instruction sampling
    11.
    发明授权
    Quantifying completion stalls using instruction sampling 失效
    使用指令采样量化完成档位

    公开(公告)号:US08234484B2

    公开(公告)日:2012-07-31

    申请号:US12099944

    申请日:2008-04-09

    IPC分类号: G06F9/30

    摘要: A method, computer program product, and data processing system for collecting metrics regarding completion stalls in an out-of-order superscalar processor with branch prediction is disclosed. A preferred embodiment of the present invention selectively samples particular instructions (or classes of instructions). Each selected instruction, as it passes through the processor datapath, is marked (tagged) for monitoring by a performance monitoring unit. The progress of marked instructions is monitored by the performance monitoring unit, and various stall counters are triggered by the progress of the marked instructions and the instruction groups they form a part of. The stall counters count cycles to give an indication of when certain delays associated with particular instructions occur and how serious the delays are.

    摘要翻译: 公开了一种用于在具有分支预测的无序超标量处理器中收集关于完成停顿的度量的方法,计算机程序产品和数据处理系统。 本发明的优选实施例有选择地对特定指令(或指令类别)进行采样。 每个选定的指令在通过处理器数据路径时被标记(标记),用于由性能监视单元监视。 标记指令的进度由性能监控单元进行监控,各种失速计数器由标记指令和指令组的进度触发。 停顿计数器计数周期,以指示何时发生与特定指令相关的某些延迟以及延迟的严重程度。

    Method and apparatus for measuring pipeline stalls in a microprocessor
    12.
    发明授权
    Method and apparatus for measuring pipeline stalls in a microprocessor 有权
    用于测量微处理器中管道停顿的方法和装置

    公开(公告)号:US07617385B2

    公开(公告)日:2009-11-10

    申请号:US11675112

    申请日:2007-02-15

    IPC分类号: G06F11/28

    摘要: A computer implemented method, apparatus, and computer program product for monitoring execution of instructions in an instruction pipeline. The process identifies a number of stall cycles for a group of instructions to complete execution. The process retrieves a deterministic latency pattern corresponding to the group of instructions. The process compares the number of stall cycles to the deterministic execution latency pattern. The process identifies the instruction as a dependent instruction in response to a determination that an instruction in the group of instructions completed a deterministic number of cycles after an antecedent instruction completed.

    摘要翻译: 一种计算机实现的方法,装置和计算机程序产品,用于监视指令流水线中的指令的执行。 该过程识别一组完成执行的指令的停顿周期数。 该过程检索对应于该组指令的确定性延迟模式。 该过程将停顿周期数与确定性执行延迟模式进行比较。 响应于确定指令组中的指令在前提指令完成之后完成确定性循环次数,该过程将该指令标识为依赖指令。

    METHOD AND APPARATUS FOR MEASURING PIPELINE STALLS IN A MICROPROCESSOR
    13.
    发明申请
    METHOD AND APPARATUS FOR MEASURING PIPELINE STALLS IN A MICROPROCESSOR 有权
    用于测量微处理器中管道的方法和装置

    公开(公告)号:US20080201566A1

    公开(公告)日:2008-08-21

    申请号:US11675112

    申请日:2007-02-15

    IPC分类号: G06F9/38

    摘要: A computer implemented method, apparatus, and computer program product for monitoring execution of instructions in an instruction pipeline. The process identifies a number of stall cycles for a group of instructions to complete execution. The process retrieves a deterministic latency pattern corresponding to the group of instructions. The process compares the number of stall cycles to the deterministic execution latency pattern. The process identifies the instruction as a dependent instruction in response to a determination that an instruction in the group of instructions completed a deterministic number of cycles after an antecedent instruction completed.

    摘要翻译: 一种计算机实现的方法,装置和计算机程序产品,用于监视指令流水线中的指令的执行。 该过程识别一组完成执行的指令的停顿周期数。 该过程检索对应于该组指令的确定性延迟模式。 该过程将停顿周期数与确定性执行延迟模式进行比较。 响应于确定指令组中的指令在前提指令完成之后完成确定性循环次数,该过程将该指令标识为依赖指令。

    Method system and apparatus for instruction tracing with out of order processors
    14.
    发明授权
    Method system and apparatus for instruction tracing with out of order processors 失效
    用于无序处理器的指令跟踪的方法系统和装置

    公开(公告)号:US06694427B1

    公开(公告)日:2004-02-17

    申请号:US09552859

    申请日:2000-04-20

    IPC分类号: G06F900

    摘要: A method, system and apparatus for instruction tracing with out of order speculative processors. With the present invention, information corresponding to the state of an instruction cache and a data cache is stored in a trace storage device along with information corresponding to instructions fetched by the processor. When a cache load is necessary, updated cache information is stored in the trace storage device. Thereby, the state of the cache at all times during fetching of instructions may be known from the information stored in the trace storage device. Additionally, the particular instructions fetched is known from the fetched instructions information stored in the trace storage device. Hence the instruction stream may be reconstructed from the information stored in the trace storage device.

    摘要翻译: 用于无序推测处理器的指令跟踪的方法,系统和装置。 利用本发明,与指令高速缓存和数据高速缓存的状态相对应的信息与对应于由处理器获取的指令的信息一起存储在跟踪存储设备中。 当需要缓存加载时,更新的缓存信息被存储在跟踪存储设备中。 因此,可以从存储在跟踪存储装置中的信息中知道在取指令期间的任何时候的高速缓存的状态。 此外,从存储在跟踪存储设备中的获取的指令信息中可以获得所提取的特定指令。 因此,可以从存储在跟踪存储设备中的信息重建指令流。

    System and method for distributing signal with efficiency over microprocessor
    15.
    发明授权
    System and method for distributing signal with efficiency over microprocessor 失效
    通过微处理器分配信号的系统和方法

    公开(公告)号:US08055809B2

    公开(公告)日:2011-11-08

    申请号:US12343594

    申请日:2008-12-24

    IPC分类号: G06F3/00

    摘要: A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.

    摘要翻译: 一种用于通过微处理器分配信号的系统和相关方法。 性能监视单元(PMU)将配置信号发送到单元以监视本机发生的事件。 该单元连接到配置总线和从PMU菊花链到微处理器中的其他单元的事件总线。 配置总线将配置信号从PMU发送到单元以设置单元以报告事件。 该单元通过事件总线向PMU发送事件信号。 该单元被配置为接收到包括该单元的总线斜坡的基地址的配置信号。 通过调整配置信号中的位域的长度,可灵活选择多个单元和多个监控事件。

    Concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers
    16.
    发明授权
    Concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers 失效
    在跟踪过程和使用可编程可变数量的共享内存写入缓冲区的非跟踪过程之间同时共享内存控制器

    公开(公告)号:US07913123B2

    公开(公告)日:2011-03-22

    申请号:US12210005

    申请日:2008-09-12

    IPC分类号: G06F11/00

    CPC分类号: G06F11/2268 G06F11/348

    摘要: An apparatus and computer program product are disclosed for, in a processor, concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers. A hardware trace facility captures hardware trace data in a processor. The hardware trace facility is included within the processor. The hardware trace data is transmitted to a system memory utilizing a system bus. The system memory is included within the system. The system bus is capable of being utilized by processing units included in the processing node while the hardware trace data is being transmitted to the system bus. Part of system memory is utilized to store the trace data. The system memory is capable of being accessed by processing units in the processing node other than the hardware trace facility while part of the system memory is being utilized to store the trace data.

    摘要翻译: 公开了一种装置和计算机程序产品,用于在处理器中使用可编程可变数量的共享存储器写缓冲器在跟踪处理和非跟踪处理之间共享存储器控制器。 硬件跟踪设备捕获处理器中的硬件跟踪数据。 硬件跟踪工具包含在处理器内。 使用系统总线将硬件跟踪数据传输到系统存储器。 系统内存包含在系统中。 当将硬件跟踪数据发送到系统总线时,系统总线能够被包括在处理节点中的处理单元利用。 系统内存的一部分用于存储跟踪数据。 系统存储器能够被处理节点除硬件跟踪设备之外的处理单元访问,同时系统存储器的一部分用于存储跟踪数据。

    Counting latencies of an instruction table flush, refill and instruction execution using a plurality of assigned counters
    17.
    发明授权
    Counting latencies of an instruction table flush, refill and instruction execution using a plurality of assigned counters 失效
    使用多个分配的计数器计数指令表的等待时间,刷新,补充和指令执行

    公开(公告)号:US06970999B2

    公开(公告)日:2005-11-29

    申请号:US10210415

    申请日:2002-07-31

    IPC分类号: G06F9/38 G06F9/44 G06F15/00

    摘要: A method and system for analyzing cycles per instruction (CPI) performance in a processor. A completion table corresponds to the instructions in a group to be processed by the processor. An empty completion table indicates that there has been some type of catastrophe that caused a table flush. While the table is empty, a performance monitoring counter (PMC), located in a performance monitoring unit (PMU) in the processor, counts the number of clock cycles that the table is empty. Preferably, a separate PMC is utilized depending on the reason that the completion table is empty. A second PMC likewise counts the number of clock cycles spent re-filling the empty completion table. A third PMC counts the number of clock cycles spent actually executing the instructions in the completion table. The information in the PMC's can be used to evaluate the true cause for degradation of CPI performance.

    摘要翻译: 一种用于分析处理器中每条指令(CPI)性能的循环的方法和系统。 完成表对应于要由处理器处理的组中的指令。 一个空的完成表表明有一些类型的灾难导致表冲洗。 当表为空时,位于处理器中的性能监视单元(PMU)中的性能监视计数器(PMC)会计数表为空的时钟周期数。 优选地,根据完成表为空的原因,使用单独的PMC。 第二个PMC同样计算重新填充空完成表的时钟周期数。 第三个PMC计算在完成表中实际执行指令花费的时钟周期数。 PMC中的信息可用于评估CPI性能下降的真正原因。

    Method and apparatus for identifying instructions for performance monitoring in a microprocessor
    18.
    发明授权
    Method and apparatus for identifying instructions for performance monitoring in a microprocessor 失效
    用于识别用于微处理器中的性能监视的指令的方法和装置

    公开(公告)号:US06539502B1

    公开(公告)日:2003-03-25

    申请号:US09436109

    申请日:1999-11-08

    IPC分类号: G06F1130

    摘要: A method and apparatus for selecting an instruction to be monitored within a pipelined processor is presented. One or more pairs of match values stored in control registers are allocated for use in instruction sampling or instruction matching. These pairs, referred to as V0 and V1, are used together to filter instructions for sampling or for instruction matching. During the fetch or decode stage, the instruction word is compared bit by bit to the V0 and V1 pair(s). For each bit in the instruction word, the corresponding bit in V0 and V1 are used to determine if a match exists. If every bit position in the instruction word results in a match, the instruction is eligible for sampling. If any bit position does not match, the instruction is not eligible. In response to a determination that the instruction is eligible for sampling, the execution of the instruction may be monitored.

    摘要翻译: 提出了一种在流水线处理器内选择要监视的指令的方法和装置。 存储在控制寄存器中的一对或多对匹配值被分配用于指令采样或指令匹配。 这些对,称为V0和V1,一起用于过滤用于采样或指令匹配的指令。 在提取或解码阶段,将指令字逐位比较为V0和V1对。 对于指令字中的每个位,V0和V1中的相应位用于确定是否存在匹配。 如果指令字中的每个位都产生匹配,则该指令有资格进行采样。 如果任何位位置不匹配,则说明不符合条件。 响应于确定该指令有资格进行采样,可以监视该指令的执行。

    Method and apparatus for monitoring the performance of internal queues in a microprocessor
    19.
    发明授权
    Method and apparatus for monitoring the performance of internal queues in a microprocessor 失效
    用于监视微处理器内部队列性能的方法和装置

    公开(公告)号:US06530042B1

    公开(公告)日:2003-03-04

    申请号:US09436108

    申请日:1999-11-08

    IPC分类号: G06F1130

    摘要: A method and apparatus for monitoring an internal queue within a processor, such as an instruction completion table or instruction re-order buffer, is presented. The performance monitoring unit of the processor contains multiple counters, and each counter counts occurrences of specified events. An internal queue of the processor may be specified to be monitored. A count of event signals indicating a successful allocation request for an entry in the internal queue is divided by a count of event signals indicating a passage of units of time to obtain the average rate for allocation requests for queue entries in the specified internal queue. A count of event signals indicating an occupation of a specific entry in the internal queue during a unit of time is divided by a count of event signals indicating an allocation of a specific entry in the internal queue to obtain the average time spent in the internal queue. An average number of entries in the internal queue is computed as a product of the average rate for allocation requests for queue entries and the average time spent in the internal queue. An event signal that indicates failure of an allocation request for an entry in the internal queue may be monitored.

    摘要翻译: 提出了一种用于监视处理器内的内部队列的方法和装置,例如指令完成表或指令重新排序缓冲器。 处理器的性能监视单元包含多个计数器,每个计数器计数指定事件的出现次数。 可以指定处理器的内部队列进行监视。 指示对内部队列中的条目的成功分配请求的事件信号的计数除以指示通过时间单位的事件信号的计数,以获得指定的内部队列中的队列条目的分配请求的平均速率。 指示在时间单位内对内部队列中的特定条目的占用的事件信号的计数除以表示内部队列中的特定条目的分配的事件信号的计数,以获得在内部队列中花费的平均时间 。 内部队列中的平均条目数量计算为队列条目的分配请求的平均速率和在内部队列中花费的平均时间的乘积。 可以监视指示内部队列中的条目的分配请求失败的事件信号。

    SYSTEM AND METHOD FOR DISTRIBUTING SIGNAL WITH EFFICIENCY OVER MICROPROCESSOR
    20.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTING SIGNAL WITH EFFICIENCY OVER MICROPROCESSOR 失效
    通过微处理器分发信号的效率的系统和方法

    公开(公告)号:US20100161867A1

    公开(公告)日:2010-06-24

    申请号:US12343594

    申请日:2008-12-24

    IPC分类号: G06F13/36

    摘要: A system and associated method for distributing signals with efficiency over a microprocessor. A performance monitoring unit (PMU) sends configuration signals to a unit to monitor an event occurring on the unit. The unit is attached to a configuration bus and an event bus that are daisy-chained from PMU to other units in the microprocessor. The configuration bus transmits configuration signals from the PMU to the unit to set the unit to report the event. The unit sends event signals to the PMU through the event bus. The unit is configured upon receiving configuration signals comprising a base address of a bus ramp of the unit. A number of units and a number of events for monitoring is flexibly selected by adjusting a length of bit fields within configuration signals.

    摘要翻译: 一种用于通过微处理器分配信号的系统和相关方法。 性能监控单元(PMU)将配置信号发送到单元以监视本机发生的事件。 该单元连接到配置总线和从PMU菊花链到微处理器中的其他单元的事件总线。 配置总线将配置信号从PMU发送到单元以设置单元以报告事件。 该单元通过事件总线向PMU发送事件信号。 该单元被配置为接收到包括该单元的总线斜坡的基地址的配置信号。 通过调整配置信号中的位域的长度,可灵活选择多个单元和多个监控事件。