System and method for improving branch prediction in compiled program
code
    1.
    发明授权
    System and method for improving branch prediction in compiled program code 失效
    用于改进编译程序代码中的分支预测的系统和方法

    公开(公告)号:US5659752A

    公开(公告)日:1997-08-19

    申请号:US497303

    申请日:1995-06-30

    IPC分类号: G06F9/38 G06F9/45 G06F11/34

    摘要: A method and system for optimizing branch prediction in an executable computer program compiled for execution on a pipelined processor that employs branch prediction. The source program is compiled and, in one embodiment, instrumented to collect branch selection statistics. The compiled program is run and statistics collected using the instrumentation or a standard trace program. The branch statistics are used to modify the executable program to cause branch prediction to be correct a majority of the time for the workload against which the program was run. In a computer system having a branch prediction bit, that bit is set or cleared to cause correct branch prediction a majority of the time.

    摘要翻译: 一种用于优化可执行计算机程序中的分支预测的方法和系统,其被编译为在采用分支预测的流水线处理器上执行。 编译源程序,并且在一个实施例中,用于收集分支选择统计信息。 运行编译的程序,并使用仪器或标准跟踪程序收集统计信息。 分支统计信息用于修改可执行程序,以使运行程序的工作量的大部分时间正确。 在具有分支预测位的计算机系统中,该位被设置或清除以在大部分时间内产生正确的分支预测。

    Prefetch optimizer measuring execution time of instruction sequence cycling through each selectable hardware prefetch depth and cycling through disabling each software prefetch instruction of an instruction sequence of interest
    2.
    发明授权
    Prefetch optimizer measuring execution time of instruction sequence cycling through each selectable hardware prefetch depth and cycling through disabling each software prefetch instruction of an instruction sequence of interest 有权
    预取优化器测量指令序列的执行时间循环通过每个可选择的硬件预取深度并循环通过禁用感兴趣的指令序列的每个软件预取指令

    公开(公告)号:US09043579B2

    公开(公告)日:2015-05-26

    申请号:US13347672

    申请日:2012-01-10

    IPC分类号: G06F9/30 G06F12/08

    摘要: A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time.

    摘要翻译: 用于信息处理系统(IHS)的预取优化器工具可以通过控制硬件预取操作和软件预取操作来提高有效的存储器访问时间。 预取优化器工具有选择地禁用应用程序内感兴趣的指令序列中的预取指令。 当禁用不同的预取指令时,该工具可测量感兴趣的指令序列的执行时间。 通过禁用不同的预取指令并进行相应的执行时间测量,该工具可以保持硬件预取深度不变。 或者,对于感兴趣的指令序列中的每个禁用的预取指令,该工具可以循环通过不同的硬件预取深度,并在每个硬件预取深度处采取相应的执行时间测量。 该工具选择与基准执行时间相比可以提高执行时间的硬件预取深度和预取指令禁用的组合。

    Retrieving event data for logical partitions
    3.
    发明授权
    Retrieving event data for logical partitions 失效
    检索逻辑分区的事件数据

    公开(公告)号:US07478219B2

    公开(公告)日:2009-01-13

    申请号:US11106007

    申请日:2005-04-14

    IPC分类号: G06F12/00

    摘要: A method, apparatus, system, and signal-bearing medium that, in an embodiment, retrieve event data from a processor for sampling intervals, where the sampling intervals are evenly distributed, but the control points at which the event data is retrieved are unevenly distributed. The processor executes instructions for logical partitions, and the event data is associated with events that are detected by the processor during the sampling intervals. In response to an interrupt received from the processor at the control point, a determination is made whether the sample point has been reached. If the sample point has been reached, the event data is retrieved from the processor and an event counter is reset to a value that is calculated to cause the processor to include an identical number of the events in the sampling intervals. The value is calculated based on the event counter at the time control point, the event counter at a time of the sample point, and the number of events in the sampling interval. In this way, an even distribution of event data may be collected when the processor is allocated to multiple partitions in a logically-partitioned system.

    摘要翻译: 一种方法,装置,系统和信号承载介质,其在一个实施例中从处理器检索采样间隔的采样间隔的事件数据,其中采样间隔被均匀分布,但是事件数据被检索的控制点不均匀分布 。 处理器执行逻辑分区的指令,并且事件数据与在采样间隔期间由处理器检测到的事件相关联。 响应于在控制点从处理器接收的中断,确定是否已经到达采样点。 如果已经到达采样点,则从处理器检索事件数据,并且事件计数器被重置为被计算的值,以使处理器在采样间隔中包括相同数量的事件。 该值基于时间控制点处的事件计数器,采样点时的事件计数器和采样间隔中的事件数量来计算。 以这种方式,当处理器被分配给逻辑分区系统中的多个分区时,可以收集事件数据的均匀分布。

    System, method, and computer program product for reducing overhead associated with software lock monitoring
    4.
    发明授权
    System, method, and computer program product for reducing overhead associated with software lock monitoring 失效
    系统,方法和计算机程序产品,用于减少与软件锁监控相关的开销

    公开(公告)号:US06820176B2

    公开(公告)日:2004-11-16

    申请号:US10138900

    申请日:2002-05-02

    IPC分类号: G06F1214

    摘要: A system, method, and computer program product are disclosed for reducing overhead associated with software lock monitoring in a multiple-processor data processing system having a memory that is shared among the multiple processors. Multiple memory locations in the shared-memory are associated with one of multiple locks. Overhead is reduced by generating a trace hook only in response to activity associated with lock misses.

    摘要翻译: 公开了一种系统,方法和计算机程序产品,用于在具有在多个处理器之间共享的存储器的多处理器数据处理系统中减少与软件锁监视相关的开销。 共享内存中的多个内存位置与多个锁之一相关联。 通过仅针对与锁定未命中相关的活动来生成跟踪钩来降低开销。

    INFORMATION HANDLING SYSTEM INCLUDING HARDWARE AND SOFTWARE PREFETCH
    5.
    发明申请
    INFORMATION HANDLING SYSTEM INCLUDING HARDWARE AND SOFTWARE PREFETCH 有权
    信息处理系统,包括硬件和软件

    公开(公告)号:US20130179663A1

    公开(公告)日:2013-07-11

    申请号:US13347672

    申请日:2012-01-10

    IPC分类号: G06F9/30 G06F9/312

    摘要: A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time.

    摘要翻译: 用于信息处理系统(IHS)的预取优化器工具可以通过控制硬件预取操作和软件预取操作来提高有效的存储器访问时间。 预取优化器工具有选择地禁用应用程序内感兴趣的指令序列中的预取指令。 当禁用不同的预取指令时,该工具可测量感兴趣的指令序列的执行时间。 通过禁用不同的预取指令并进行相应的执行时间测量,该工具可以保持硬件预取深度不变。 或者,对于感兴趣的指令序列中的每个禁用的预取指令,该工具可以循环通过不同的硬件预取深度,并在每个硬件预取深度处采取相应的执行时间测量。 该工具选择与基准执行时间相比可以提高执行时间的硬件预取深度和预取指令禁用的组合。

    System and method for acquiring high granularity performance data in a
computer system
    6.
    发明授权
    System and method for acquiring high granularity performance data in a computer system 失效
    在计算机系统中获取高粒度性能数据的系统和方法

    公开(公告)号:US5774724A

    公开(公告)日:1998-06-30

    申请号:US560878

    申请日:1995-11-20

    IPC分类号: G06F11/34 G06F11/30

    摘要: A microprocessor performance monitor and instruction address break point facility are interconnected to provide finer granularity and performance monitoring. The microprocessor is initialized to collect processor statistics preselected prior to performance monitoring. Application start and stop instruction breakpoint addresses are preselected from a software program bounding instructions for which such statistics are desired. An exception handler is installed for instruction address breakpoints (IAB), enabling and disabling the performance monitor and stop addresses, respectively. The IAB register is then initalized to the start address, and the statistics counters are cleared. Upon starting the application, when the application start address instruction is executed, the breakpoint handler obtains control and enables the performance monitor counters, which count the desired statistics after returning from the breakpoint handler. Before returning, the handler sets the IAB register to the stop address. When the application stop address is encountered, the breakpoint handler disables the performance monitor counters, and rearms the start address in the IAB register. The performance monitor counters are then read to determine the desired statistics for the specific sequence of code within the boundaries of the start and stop addresses in the application.

    摘要翻译: 微处理器性能监视器和指令地址断点设施相互连接,以提供更精细的粒度和性能监控。 微处理器被初始化以在性能监视之前收集预选的处理器统计信息。 应用程序启动和停止指令断点地址是从需要这种统计信息的软件程序预先选择的。 针对指令地址断点(IAB)安装了异常处理程序,分别启用和禁用性能监视器和停止地址。 然后将IAB寄存器初始化为起始地址,并清除统计计数器。 启动应用程序时,当执行应用程序开始地址指令时,断点处理程序获取控制,并启用性能监视计数器,该计数器从断点处理程序返回后对所需的统计信息进行计数。 在返回之前,处理程序将IAB寄存器设置为停止地址。 当遇到应用程序停止地址时,断点处理程序将禁用性能监视器计数器,并且将IAB寄存器中的起始地址置于后缀。 然后读取性能监视计数器,以确定应用程序中开始和停止地址边界内特定代码序列的所需统计信息。

    Determining performance of a software entity
    7.
    发明授权
    Determining performance of a software entity 有权
    确定软件实体的性能

    公开(公告)号:US08850402B2

    公开(公告)日:2014-09-30

    申请号:US12470705

    申请日:2009-05-22

    IPC分类号: G06F9/44 G06F11/34

    摘要: Methods, systems, and products for determining performance of a software entity running on a data processing system. The method comprises allowing extended execution of the software entity without monitoring code. The method also comprises intermittently sampling behavior data for the software entity. Intermittently sampling behavior data may be carried out by injecting monitoring code into the software entity to instrument the software entity, collecting behavior data by utilizing the monitoring code, and removing the monitoring code. The method also comprises repeatedly performing iterations of the allowing and sampling steps until collected behavior data is sufficient for diagnosing performance of the software entity. The method may further comprise analyzing the collected behavior data to diagnose performance of the software entity.

    摘要翻译: 用于确定在数据处理系统上运行的软件实体的性能的方法,系统和产品。 该方法包括允许软件实体的扩展执行而不监视代码。 该方法还包括间歇性地采样软件实体的行为数据。 可以通过将监控代码注入软件实体来对软件实体进行仪器化,利用监控代码收集行为数据,并去除监控代码来执行间歇采样行为数据。 该方法还包括重复执行允许和采样步骤的迭代,直到所收集的行为数据足以诊断软件实体的性能为止。 该方法还可以包括分析收集的行为数据以诊断软件实体的性能。

    Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer
    8.
    发明授权
    Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer 失效
    通过使用插入缓存到缓存数据传输的重置指令来管理带宽

    公开(公告)号:US07168070B2

    公开(公告)日:2007-01-23

    申请号:US10853304

    申请日:2004-05-25

    IPC分类号: G06F9/45 G06F13/00

    摘要: A method and system for reducing or avoiding store misses with a data cache block zero (DCBZ) instruction in cooperation with the underlying hardware load stream prefetching support for helping to increase effective aggregate bandwith. The method identifies and classifies unique streams in a loop based on dependency and reuse analysis, and performs loop transformations, such as node splitting, loop distribution or stream unrolling to get the proper number of streams. Static prediction and run-time profile information are used to guide loop and stream selection. Compile-time loop cost analysis and run-time check code and versioning are used to determine the number of cache lines ahead of each reference for data cache line zeroing and to tolerate required data alignment relative to data cache lines.

    摘要翻译: 与底层硬件负载流预取支持协作,通过数据缓存块零(DCBZ)指令减少或避免存储错误的方法和系统,以帮助增加有效的聚合带宽。 该方法基于依赖和重用分析在循环中识别和分类唯一流,并执行循环转换,例如节点分割,循环分布或流展开以获得适当数量的流。 静态预测和运行时间轮廓信息用于指导循环和流选择。 编译时循环成本分析和运行时检查代码和版本控制用于确定数据高速缓存行归零的每个引用之前的高速缓存行数,并允许相对于数据高速缓存行的所需数据对齐。

    Method and system for reordering the instructions of a computer program
to optimize its execution
    9.
    发明授权
    Method and system for reordering the instructions of a computer program to optimize its execution 失效
    重新排序计算机程序指令以优化其执行的方法和系统

    公开(公告)号:US6006033A

    公开(公告)日:1999-12-21

    申请号:US291370

    申请日:1994-08-15

    IPC分类号: G06F9/45 G06F9/445

    CPC分类号: G06F8/445

    摘要: A system and method are provided that allows the results of an instruction trace mechanism to globally restructure the instructions. The process reorders the instructions in an executable program, using an actual execution profile (or instruction address trace) for a selected workload, to improve utilization of the existing hardware architecture. The reordering of instructions is implemented at a global level (i.e., independent of procedure or other structural boundaries which maximizes speedup) running on various hardware platforms and adds the ability to preserve correctness and debuggability for reordered executables. An unconditional branch instruction is added at the memory locations where reordered instructions previously were stored. When a dynamic branch occurs, the program will attempt to access the instruction at the original address and the unconditional branch directs the program to the reordered location of the instruction and program integrity is maintained.

    摘要翻译: 提供了允许指令跟踪机制的结果全局重组指令的系统和方法。 该过程使用用于所选工作负载的实际执行简档(或指令地址跟踪)来重新排序可执行程序中的指令,以提高现有硬件体系结构的利用率。 指令的重新排序在各种硬件平台上的全局级别(即,独立于最大化加速的过程或其他结构边界)上实现,并且增加了为重新排序的可执行程序保留正确性和可调试性的能力。 在先前存储重新排序的指令的存储器位置添加无条件转移指令。 当动态分支发生时,程序将尝试访问原始地址处的指令,无条件分支将程序引导到指令的重新排序位置,并维护程序的完整性。

    Determining Performance of a Software Entity
    10.
    发明申请
    Determining Performance of a Software Entity 有权
    确定软件实体的性能

    公开(公告)号:US20100299655A1

    公开(公告)日:2010-11-25

    申请号:US12470705

    申请日:2009-05-22

    IPC分类号: G06F9/44

    摘要: Methods, systems, and products for determining performance of a software entity running on a data processing system. The method comprises allowing extended execution of the software entity without monitoring code. The method also comprises intermittently sampling behavior data for the software entity. Intermittently sampling behavior data may be carried out by injecting monitoring code into the software entity to instrument the software entity, collecting behavior data by utilizing the monitoring code, and removing the monitoring code. The method also comprises repeatedly performing iterations of the allowing and sampling steps until collected behavior data is sufficient for diagnosing performance of the software entity. The method may further comprise analyzing the collected behavior data to diagnose performance of the software entity.

    摘要翻译: 用于确定在数据处理系统上运行的软件实体的性能的方法,系统和产品。 该方法包括允许软件实体的扩展执行而不监视代码。 该方法还包括间歇性地采样软件实体的行为数据。 可以通过将监控代码注入软件实体来对软件实体进行仪器化,利用监控代码收集行为数据,并去除监控代码来执行间歇采样行为数据。 该方法还包括重复执行允许和采样步骤的迭代,直到所收集的行为数据足以诊断软件实体的性能为止。 该方法还可以包括分析收集的行为数据以诊断软件实体的性能。