Method and apparatus for precisely identifying effective addresses associated with hardware events
    1.
    发明授权
    Method and apparatus for precisely identifying effective addresses associated with hardware events 有权
    用于精确识别与硬件事件相关的有效地址的方法和装置

    公开(公告)号:US07779238B2

    公开(公告)日:2010-08-17

    申请号:US11589492

    申请日:2006-10-30

    IPC分类号: G06F11/30

    摘要: A system and method for precisely identifying an instruction causing a performance-related event is disclosed. The instruction may be detected while in a pipeline stage of a microprocessor preceding a writeback stage and the microprocessor's architectural state may not be updated until after information identifying the instruction is captured. The instruction may be flushed from the pipeline, along with other instructions from the same thread. A hardware trap may be taken when the instruction is detected and/or when an event counter overflows or is within a given range of overflowing. A software trap handler may capture and/or log information identifying the instruction, such as one or more extended address elements, before returning control and initiating a retry of the instruction. The captured and/or logged information may be stored in an event space database usable by a data space profiler to identify performance bottlenecks in the application containing the instruction.

    摘要翻译: 公开了一种用于精确识别引起性能相关事件的指令的系统和方法。 在回写阶段之前的微处理器的流水线级中可以检测该指令,并且直到在识别出指令的信息被捕获之后才能更新微处理器的架构状态。 可以从管道中刷新指令,以及来自同一线程的其他指令。 当检测到指令和/或当事件计数器溢出或处于给定的溢出范围内时,可能会采取硬件陷阱。 软件陷阱处理程序可以在返回控制和重新启动指令之前捕获和/或记录标识指令的信息,例如一个或多个扩展地址元素。 捕获的和/或记录的信息可以存储在可由数据空间分析器使用的事件空间数据库中,以识别包含该指令的应用中的性能瓶颈。

    Method and apparatus for identifying instructions associated with execution events in a data space profiler
    2.
    发明申请
    Method and apparatus for identifying instructions associated with execution events in a data space profiler 有权
    用于识别与数据空间分析器中的执行事件相关联的指令的方法和装置

    公开(公告)号:US20080127120A1

    公开(公告)日:2008-05-29

    申请号:US11590288

    申请日:2006-10-31

    IPC分类号: G06F9/44

    摘要: A system and method for profiling a software application may include means for capturing profiling information corresponding to an instruction identified as having executed coincident with the occurrence of a runtime event, and for associating the profiling information with the event in an event set. In some embodiments, the identified instruction, which may have triggered the event, may be located in the program code sequence at a predetermined position relative to the current program counter value at the time the event was detected. The predetermined relative position may be fixed dependent on the processor architecture and may also be dependent on the event type. The predetermined relative position may be zero, indicating that when the event was detected, the program counter value corresponded to an instruction associated with the event. If the identified instruction is an ambiguity-creating instruction, an indication of ambiguity may be associated with the event.

    摘要翻译: 用于对软件应用进行分析的系统和方法可以包括用于捕获与被识别为与运行时事件的发生一致地执行的指令相对应的分析信息的装置,并且用于将分析信息与事件集中的事件相关联。 在一些实施例中,可以触发事件的所识别的指令可以在检测到事件时相对于当前程序计数器值的预定位置处于程序代码序列中。 取决于处理器架构,预定的相对位置可以是固定的,并且还可以取决于事件类型。 预定的相对位置可以为零,指示当检测到事件时,程序计数器值对应于与该事件相关联的指令。 如果所识别的指令是歧义生成指令,则可能与事件相关联的歧义指示。

    System and method for insertion of prefetch instructions by a compiler
    3.
    发明授权
    System and method for insertion of prefetch instructions by a compiler 有权
    由编译器插入预取指令的系统和方法

    公开(公告)号:US06651245B1

    公开(公告)日:2003-11-18

    申请号:US09679433

    申请日:2000-10-03

    IPC分类号: G06F945

    摘要: The present invention discloses a method and device for placing prefetch instruction in a low-level or assembly code instruction stream. It involves the use of a new concept called a martyr memory operation. When inserting prefetch instructions in a code stream, some instructions will still miss the cache because in some circumstances a prefetch cannot be added at all, or cannot be added early enough to allow the needed reference to be in cache before being referenced by an executing instruction. A subset of these instructions are identified using a new method and designated as martyr memory operations. Once identified, other memory operations that would also have been cache misses can “hide” behind the martyr memory operation and complete their prefetches while the processor, of necessity, waits for the martyr memory operation instruction to complete. This will increase the number of cache hits.

    摘要翻译: 本发明公开了一种用于将预取指令放置在低级或汇编代码指令流中的方法和装置。 它涉及使用称为烈士记忆操作的新概念。 当在代码流中插入预取指令时,一些指令仍将错过高速缓存,因为在某些情况下,根本无法添加预取,或者不能早期添加,以便在执行指令引用之前将所需的引用置于高速缓存中 。 这些指令的一个子集使用新的方法进行识别,并被指定为烈士记忆操作。 一旦识别出,也可能是高速缓存未命中的其他内存操作可以“隐藏”在烈士内存操作之后并完成其预取,而处理器必须等待烈士内存操作指令完成。 这将增加缓存命中数。

    Heuristic for identifying loads guaranteed to hit in processor cache
    4.
    发明授权
    Heuristic for identifying loads guaranteed to hit in processor cache 有权
    启发式,用于识别保证在处理器缓存中命中的负载

    公开(公告)号:US06574713B1

    公开(公告)日:2003-06-03

    申请号:US09685431

    申请日:2000-10-10

    IPC分类号: G06F1200

    摘要: A heuristic algorithm which identifies loads guaranteed to hit the processor cache which further provides a “minimal” set of prefetches which are scheduled/inserted during compilation of a program is disclosed. The heuristic algorithm of the present invention utilizes the concept of a “cache line” (i.e., the data chunks received during memory operations) in conjunction with the concept of “related” memory operations for determining which prefetches are unnecessary for related memory operations; thus, generating a minimal number of prefetches for related memory operations.

    摘要翻译: 公开了一种启发式算法,其识别确保撞击处理器高速缓存的负载,其进一步提供在编程期间被调度/插入的“最小”预取集合。 本发明的启发式算法结合“相关”存储器操作的概念,利用“高速缓存行”(即,存储器操作期间接收的数据块)的概念,用于确定哪些预取对于相关存储器操作是不必要的; 因此,为相关的存储器操作生成最少数量的预取。

    Compiler-based cache line optimization
    5.
    发明授权
    Compiler-based cache line optimization 有权
    基于编译器的缓存行优化

    公开(公告)号:US06564297B1

    公开(公告)日:2003-05-13

    申请号:US09594430

    申请日:2000-06-15

    申请人: Nicolai Kosche

    发明人: Nicolai Kosche

    IPC分类号: G06F1202

    摘要: Cache line optimization involves computing where cache misses are in a control flow and assigning probabilities to cache misses. Cache lines may be scheduled based on the assigned probabilities and where the cache misses are in the control flow. Cache line probabilities may be calculated based on the relationship of the cache line and where the cache misses are in the control flow. A control flow may be pruned before calculating cache line probabilities. Function call sites may be used to prune the control flow. Address generation of a cache miss may be duplicated to speculatively hoist address generation and the associated prefetch. References may be selected for optimization, identifying cache lines, and mapping the selected references. Dependencies within the cache lines may be determined and the cache lines may be scheduled based on the determined dependencies and probabilities of usefulness. Instructions may be scheduled based on the scheduled cache lines and the target machine model to maximize outstanding memory transactions. Cache lines may be scheduled across call sites.

    摘要翻译: 高速缓存行优化包括计算缓存未命中在控制流中的位置,并将概率分配给高速缓存未命中。 可以基于分配的概率并且高速缓存未命中的位置在控制流程中来调度高速缓存行。 高速缓存行概率可以基于高速缓存行的关系和高速缓存未命中在控制流中的位置来计算。 在计算高速缓存行概率之前,可以修剪控制流。 函数调用站点可用于修剪控制流。 高速缓存未命中的地址生成可能会被重复以推测地址生成和相关的预取。 可以选择参考以进行优化,识别高速缓存行,以及映射所选择的引用。 可以确定高速缓存行内的依赖性,并且可以基于确定的有用性的依赖性和概率来调度高速缓存行。 可以基于预定的高速缓存行和目标机器模型来调度指令以最大化未完成的存储器事务。 可以在呼叫站点之间调度缓存线。

    Method and apparatus for performing prefetching at the function level
    6.
    发明授权
    Method and apparatus for performing prefetching at the function level 有权
    用于在功能级别执行预取的方法和装置

    公开(公告)号:US06421826B1

    公开(公告)日:2002-07-16

    申请号:US09434715

    申请日:1999-11-05

    IPC分类号: G06F944

    CPC分类号: G06F9/383

    摘要: One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within regions of code that tend to generate cache misses. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system runs the executable code module in a training mode on a representative workload and keeps statistics on cache miss rates for functions within the executable code module. These statistics are used to identify a set of “hot” functions that generate a large number of cache misses. Next, explicit prefetch instructions are scheduled in advance of memory operations within the set of hot functions. In one embodiment, explicit prefetch operations are scheduled into the executable code module by activating prefetch generation at a start of an identified function, and by deactivating prefetch generation at a return from the identified function. In embodiment, the system further schedules prefetch operations for the memory operations by identifying a subset of memory operations of a particular type within the set of hot functions, and scheduling explicit prefetch operations for memory operations belonging to the subset.

    摘要翻译: 本发明的一个实施例提供了一种用于将源代码编译成可执行代码的系统,其对易于产生高速缓存未命中的代码区域内的存储器操作进行预取。 该系统通过将包含编程语言指令的源代码模块编译成包含适合于处理器执行的指令的可执行代码模块来操作。 接下来,系统以代表性工作量的训练模式运行可执行代码模块,并且保持对可执行代码模块内的功能的高速缓存未命中率的统计。 这些统计信息用于识别一组产生大量高速缓存未命中的“热”功能。 接下来,在热功能集合内的存储器操作之前安排显式预取指令。 在一个实施例中,通过在识别的功能的开始处激活预取生成,并且通过在从所识别的功能返回时停用预取生成,将显式预取操作调度到可执行代码模块中。 在实施例中,系统通过识别热功能集合内的特定类型的存储器操作的子集,并且对属于该子集的存​​储器操作调度显式预取操作来进一步调度存储器操作的预取操作。

    Method and apparatus for profiling data addresses
    7.
    发明授权
    Method and apparatus for profiling data addresses 有权
    用于分析数据地址的方法和装置

    公开(公告)号:US07827543B1

    公开(公告)日:2010-11-02

    申请号:US10840164

    申请日:2004-05-06

    IPC分类号: G06F9/44 G06F9/45 G06F9/26

    CPC分类号: G06F11/3612

    摘要: Data address profiling allows determination of sources of code execution hindrance with different perspectives of memory references and allows correlation of sampled runtime events and memory reference objects, such as cache lines. Associating sampled runtime events with data addresses provides for efficient and targeted optimization of code with respect to data addresses and physical and/or logical memory reference objects (e.g., memory segments, heap variables, variable instances, stack variables, etc.). An instruction instance is identified in relation to a sampled runtime event. A data address is determined from the instruction instance. From the determined address, a memory reference object is ascertained.

    摘要翻译: 数据地址分析允许使用不同的存储器引用视角确定代码执行阻碍的来源,并允许对采样的运行时事件和存储器引用对象(如高速缓存行)进行相关。 将采样的运行时事件与数据地址相关联地提供关于数据地址和物理和/或逻辑存储器参考对象(例如,存储器段,堆变量,变量实例,堆栈变量等)的代码的有效和有针对性的优化。 相关于采样的运行时事件识别指令实例。 从指令实例确定数据地址。 从确定的地址,确定存储器参考对象。

    Method and Apparatus for Synthesizing Hardware Counters from Performance Sampling
    8.
    发明申请
    Method and Apparatus for Synthesizing Hardware Counters from Performance Sampling 有权
    从性能抽样合成硬件计数器的方法和装置

    公开(公告)号:US20080177756A1

    公开(公告)日:2008-07-24

    申请号:US11624526

    申请日:2007-01-18

    IPC分类号: G06F17/00 G06F15/00

    摘要: A system and method for performance monitoring may use data collected from a hardware event agent comprising a hardware sampling mechanism and/or one or more hardware counters to increment one or more synthesized performance counters by an amount dependent on an expression involving the collected data. Each synthesized performance counter may be configured to count events of a different type and may comprise a machine addressable storage location. The event types may include various memory references or misses, branches, branch mispredictions, or any other event of interest in performance monitoring. The hardware event agent may comprise one or more instruction counters, cycle counters, timers, or other hardware performance counters. One hardware performance counter may be used in a time-multiplexed or data-multiplexed manner to monitor events of multiple event types. The hardware sampling mechanism may return a statistical packet for sampled instructions, which may be examined to determine the event type.

    摘要翻译: 用于性能监视的系统和方法可以使用从包括硬件采样机构和/或一个或多个硬件计数器的硬件事件代理收集的数据,以将依赖于涉及所收集的数据的表达式的量递增一个或多个合成性能计数器。 每个合成性能计数器可以被配置为对不同类型的事件进行计数,并且可以包括机器可寻址存储位置。 事件类型可能包括各种内存引用或错误,分支,分支错误预测或任何其他性能监控感兴趣的事件。 硬件事件代理可以包括一个或多个指令计数器,周期计数器,定时器或其他硬件性能计数器。 可以以时间复用或数据多路复用的方式使用一个硬件性能计数器来监视多种事件类型的事件。 硬件采样机制可以返回用于采样指令的统计分组,这可以被检查以确定事件类型。

    Method and apparatus for specification and application of a user-specified filter in a data space profiler
    9.
    发明申请
    Method and apparatus for specification and application of a user-specified filter in a data space profiler 有权
    用于在数据空间分析器中指定和应用用户指定的过滤器的方法和装置

    公开(公告)号:US20080127107A1

    公开(公告)日:2008-05-29

    申请号:US11517085

    申请日:2006-09-07

    IPC分类号: G06F9/44

    摘要: A data space profiler may include an analysis engine that associates runtime events of profiled software applications with execution costs and extended address elements. Relational agents in the analysis engine may apply functions to profile data collected for each event to determine the extended address element values to be associated with the event. Each extended address element may correspond to a data profiling object (e.g., hardware component, software construct, data allocation construct, abstract view) involved in each event. The extended address element values may be used to index into an event set for the profiled software application to present costs from the perspective of these profiling objects. A filtering mechanism may also be used to extract profile data from the event set corresponding to events that satisfy the filter criteria. By alternating between presentation of profiling object views and filtered event data, performance bottlenecks and their causes may be identified.

    摘要翻译: 数据空间分析器可以包括将分析软件应用的运行时事件与执行成本和扩展地址元素相关联的分析引擎。 分析引擎中的关系代理可以应用功能来为每个事件收集的资料数据,以确定与事件相关联的扩展地址元素值。 每个扩展地址元素可以对应于每个事件中涉及的数据分析对象(例如,硬件组件,软件构造,数据分配构造,抽象视图)。 扩展地址元素值可以用于索引到用于分析软件应用程序的事件集,以从这些分析对象的角度呈现成本。 也可以使用过滤机制从对应于满足过滤标准的事件的事件集中提取简档数据。 通过在呈现分析对象视图和过滤的事件数据之间交替,可能会识别性能瓶颈及其原因。

    Method and Apparatus for Data Space Profiling of Applications Across a Network
    10.
    发明申请
    Method and Apparatus for Data Space Profiling of Applications Across a Network 有权
    网络应用数据空间分析的方法与装置

    公开(公告)号:US20080114806A1

    公开(公告)日:2008-05-15

    申请号:US11559275

    申请日:2006-11-13

    申请人: Nicolai Kosche

    发明人: Nicolai Kosche

    IPC分类号: G06F17/30 G06F15/16

    摘要: A system and method for profiling a network application may include means for operating on context-specific data and costs. The system may include an apparatus for associating local extended address elements with a message sent from a first computing system to a second computing system across a network. The second computing system may store the received information as remote extended address information and may store its own local extended address information. An event agent may capture values of local and/or remote extended address elements in response to detecting the message or another system event and may associate the extended address elements with the message or system event in an event set accessible by a data space profiler. The extended address information may include time stamps. An event agent may determine network latency dependent on time stamps of messages and may generate an event if the latency exceeds a predetermined threshold.

    摘要翻译: 用于分析网络应用的系统和方法可以包括用于对上下文特定数据和成本进行操作的装置。 该系统可以包括用于将本地扩展地址元素与通过网络从第一计算系统发送到第二计算系统的消息相关联的装置。 第二计算系统可以将接收的信息存储为远程扩展地址信息,并且可以存储其自己的本地扩展地址信息。 响应于检测到消息或另一系统事件,事件代理可以捕获本地和/或远程扩展地址元素的值,并且可以将扩展地址元素与数据空间分析器可访问的事件中的消息或系统事件相关联。 扩展地址信息可以包括时间戳。 事件代理可以确定取决于消息的时间戳的网络延迟,并且如果等待时间超过预定阈值,则可以产生事件。