Automated detection of application performance bottlenecks
    1.
    发明授权
    Automated detection of application performance bottlenecks 有权
    自动检测应用性能瓶颈

    公开(公告)号:US08225291B2

    公开(公告)日:2012-07-17

    申请号:US11969331

    申请日:2008-01-04

    IPC分类号: G06F9/44 G06F11/00

    摘要: Detecting performance bottlenecks in a target application is provided. In response to receiving hotspot selections from a user interface, bottleneck rules are extracted from a database. A hotspot is a region of source code that exceeds a time threshold to execute in the target application. Metrics needed to evaluate the bottleneck rules extracted from the database are identified. The identified metrics are computed. It is determined whether each bottleneck rule extracted from the database is evaluated to true using the computed metrics for hotspots in the target application. In response to determining that a bottleneck rule is evaluated to true using an appropriate computed metric corresponding to the bottleneck rule, a bottleneck description is created for the bottleneck rule. Then, the bottleneck description is sent to the user interface.

    摘要翻译: 提供了检测目标应用程序中的性能瓶颈。 响应于从用户界面接收到热点选择,从数据库中提取出瓶颈规则。 热点是源代码的区域超过在目标应用程序中执行的时间阈值。 确定从数据库中提取瓶颈规则所需的指标。 计算确定的度量。 使用目标应用程序中的热点计算的度量来确定从数据库提取的每个瓶颈规则是否被评估为真。 响应于使用与瓶颈规则相对应的适当的计算度量来确定瓶颈规则为真,为瓶颈规则创建瓶颈描述。 然后,将瓶颈描述发送到用户界面。

    PROGRAMMABLE FRAMEWORK FOR AUTOMATIC TUNING OF SOFTWARE APPLICATIONS
    2.
    发明申请
    PROGRAMMABLE FRAMEWORK FOR AUTOMATIC TUNING OF SOFTWARE APPLICATIONS 有权
    用于软件应用自动调谐的可编程框架

    公开(公告)号:US20100180255A1

    公开(公告)日:2010-07-15

    申请号:US12353433

    申请日:2009-01-14

    IPC分类号: G06F9/44

    CPC分类号: G06F8/443

    摘要: A target application is automatically tuned. A list of solutions for identified performance bottlenecks in a target application is retrieved from a storage device. A plurality of modules is executed to compute specific parameters for solutions contained in the list of solutions. A list of modification commands associated with specific parameters computed by the plurality of modules is generated. The list of modification commands associated with the specific parameters is appended to a command sequence list. The list of modification commands is implemented in the target application. Specific source code regions corresponding to the identified performance bottlenecks in the target application are automatically tuned using the implemented list of modification commands. Then, the tuned target application is stored in the storage device.

    摘要翻译: 目标应用程序会自动调整。 从存储设备中检索目标应用程序中识别的性能瓶颈的解决方案列表。 执行多个模块以计算解决方案列表中包含的解的特定参数。 生成与由多个模块计算的特定参数相关联的修改命令的列表。 与特定参数相关联的修改命令的列表将附加到命令序列列表。 修改命令的列表在目标应用程序中实现。 使用实现的修改命令列表来自动调整与目标应用程序中识别的性能瓶颈对应的特定源代码区域。 然后,调谐目标应用程序存储在存储设备中。

    Increasing parallel program performance for irregular memory access problems with virtual data partitioning and hierarchical collectives
    3.
    发明授权
    Increasing parallel program performance for irregular memory access problems with virtual data partitioning and hierarchical collectives 有权
    通过虚拟数据分区和分层集合,增加并行程序性能,实现不规则的内存访问问题

    公开(公告)号:US08869155B2

    公开(公告)日:2014-10-21

    申请号:US12945488

    申请日:2010-11-12

    IPC分类号: G06F9/46 G06F9/52

    CPC分类号: G06F9/522

    摘要: A method for increasing performance of an operation on a distributed memory machine is provided. Asynchronous parallel steps in the operation are transformed into synchronous parallel steps. The synchronous parallel steps of the operation are rearranged to generate an altered operation that schedules memory accesses for increasing locality of reference. The altered operation that schedules memory accesses for increasing locality of reference is mapped onto the distributed memory machine. Then, the altered operation is executed on the distributed memory machine to simulate local memory accesses with virtual threads to check cache performance within each node of the distributed memory machine.

    摘要翻译: 提供了一种用于提高分布式存储器机器上的操作性能的方法。 操作中的异步并行步骤转换为同步并行步骤。 操作的同步并行步骤被重新排列以产生改变的操作,其调度存储器访问以增加参考的位置。 调度存储器访问以增加参考位置的改变的操作被映射到分布式存储器机器上。 然后,在分布式存储器机器上执行改变的操作以模拟具有虚拟线程的本地存储器访问,以检查分布式存储器机器的每个节点内的高速缓存性能。

    Profiling application performance according to data structure
    4.
    发明授权
    Profiling application performance according to data structure 失效
    根据数据结构分析应用性能

    公开(公告)号:US08490061B2

    公开(公告)日:2013-07-16

    申请号:US12436894

    申请日:2009-05-07

    摘要: During runtime of a binary program file, streams of instructions are executed and memory references, generated by instrumentation applied to given ones of the instructions that refer to memory locations, are collected. A transformation is performed, based on the executed streams of instructions and the collected memory references, to obtain a table. The table lists memory events of interest for active data structures for each function in the program file. The transformation is performed to translate memory addresses for given ones of the instructions and given ones of the data structures into locations and variable names in a source file corresponding to the binary file. At least the memory events of interest are displayed, and the display is organized so as to correlate the memory events of interest with corresponding ones of the data structures.

    摘要翻译: 在二进制程序文件的运行期间,执行指令流,并且收集通过应用于指向存储器位置的给定指令的仪器产生的存储器引用。 基于所执行的指令流和所收集的存储器引用执行变换以获得表。 该表列出了程序文件中每个功能的活动数据结构感兴趣的内存事件。 执行转换以将给定的指令的内存地址转换为与二进制文件相对应的源文件中的数据结构中的位置和变量名。 至少显示感兴趣的存储器事件,并且显示被组织以使感兴趣的存储器事件与相应的数据结构相关联。

    Increasing Parallel Program Performance for Irregular Memory Access Problems with Virtual Data Partitioning and Hierarchical Collectives
    5.
    发明申请
    Increasing Parallel Program Performance for Irregular Memory Access Problems with Virtual Data Partitioning and Hierarchical Collectives 有权
    增加虚拟数据分区和分层集体不规则内存访问问题的并行程序性能

    公开(公告)号:US20120124585A1

    公开(公告)日:2012-05-17

    申请号:US12945488

    申请日:2010-11-12

    IPC分类号: G06F9/46

    CPC分类号: G06F9/522

    摘要: A method for increasing performance of an operation on a distributed memory machine is provided. Asynchronous parallel steps in the operation are transformed into synchronous parallel steps. The synchronous parallel steps of the operation are rearranged to generate an altered operation that schedules memory accesses for increasing locality of reference. The altered operation that schedules memory accesses for increasing locality of reference is mapped onto the distributed memory machine. Then, the altered operation is executed on the distributed memory machine to simulate local memory accesses with virtual threads to check cache performance within each node of the distributed memory machine.

    摘要翻译: 提供了一种用于提高分布式存储器机器上的操作性能的方法。 操作中的异步并行步骤转换为同步并行步骤。 操作的同步并行步骤被重新排列以产生改变的操作,其调度存储器访问以增加参考的位置。 调度存储器访问以增加参考位置的改变的操作被映射到分布式存储器机器上。 然后,在分布式存储器机器上执行改变的操作以模拟具有虚拟线程的本地存储器访问,以检查分布式存储器机器的每个节点内的高速缓存性能。

    METHOD AND SYSTEM FOR AUTOMATED DETECTION OF APPLICATION PERFORMANCE BOTTLENECKS
    6.
    发明申请
    METHOD AND SYSTEM FOR AUTOMATED DETECTION OF APPLICATION PERFORMANCE BOTTLENECKS 有权
    用于自动检测应用性能瓶颈的方法和系统

    公开(公告)号:US20090177642A1

    公开(公告)日:2009-07-09

    申请号:US11969331

    申请日:2008-01-04

    IPC分类号: G06F17/30

    摘要: A system for detecting performance bottlenecks in a target application. In response to receiving hotspot selections from a user interface, bottleneck rules are extracted from a database. A hotspot is a region of source code that exceeds a time threshold to execute in the target application. Metrics needed to evaluate the bottleneck rules extracted from the database are identified. The identified metrics are computed. It is determined whether each bottleneck rule extracted from the database is evaluated to true using the computed metrics for hotspots in the target application. In response to determining that a bottleneck rule is evaluated to true using an appropriate computed metric corresponding to the bottleneck rule, a bottleneck description is created for the bottleneck rule. Then, the bottleneck description is sent to the user interface.

    摘要翻译: 用于检测目标应用程序中性能瓶颈的系统。 响应于从用户界面接收到热点选择,从数据库中提取出瓶颈规则。 热点是源代码的区域超过在目标应用程序中执行的时间阈值。 确定从数据库中提取瓶颈规则所需的指标。 计算确定的度量。 使用目标应用程序中的热点计算的度量来确定从数据库提取的每个瓶颈规则是否被评估为真。 响应于使用与瓶颈规则相对应的适当的计算度量来确定瓶颈规则为真,为瓶颈规则创建瓶颈描述。 然后,将瓶颈描述发送到用户界面。

    Iterative, Non-Uniform Profiling Method for Automatically Refining Performance Bottleneck Regions in Scientific Code
    7.
    发明申请
    Iterative, Non-Uniform Profiling Method for Automatically Refining Performance Bottleneck Regions in Scientific Code 有权
    科学代码自动优化性能瓶颈区域的迭代,非均匀分析方法

    公开(公告)号:US20080282232A1

    公开(公告)日:2008-11-13

    申请号:US11746171

    申请日:2007-05-09

    IPC分类号: G06F9/44

    摘要: A method for profiling performance of a system includes steps of: monitoring execution of the system at multiple points during the system's operation; analyzing results derived from the monitoring in order to provide analyzed results; reconfiguring the monitoring non-uniformly according to the analyzed results; and repeatedly performing iterations of the above steps until a particular event occurs. The iterations may be terminated upon: reaching a specified level of analysis precision, determining a source of one or more performance bottlenecks, determining a source of unexpectedly high output or low completion time, completing a predefined number of iterations, reaching an endpoint of an application, or having performed iterations for a specified period of time.

    摘要翻译: 一种用于系统性能分析的方法包括以下步骤:在系统操作期间在多个点监视系统的执行; 分析来自监测的结果,以提供分析结果; 根据分析结果重新配置监控不均匀; 并重复执行上述步骤的迭代,直到发生特定事件。 迭代可以终止于:达到指定级别的分析精度,确定一个或多个性能瓶颈的来源,确定意外高输出或低完成时间的来源,完成预定数量的迭代,到达应用的端点 ,或者已经执行了指定时间段的迭代。

    Fast prediction of shared memory access pattern
    8.
    发明授权
    Fast prediction of shared memory access pattern 有权
    快速预测共享内存访问模式

    公开(公告)号:US08819346B2

    公开(公告)日:2014-08-26

    申请号:US13416331

    申请日:2012-03-09

    IPC分类号: G06F12/00

    摘要: A computer implemented method analyzes shared memory accesses during execution of an application program. The method includes instrumenting events of shared memory accesses in the application program, where the application program is to be executed on a target configuration having p nodes; executing the application program using p1 processing nodes, where p1 is less than p and satisfies a constraint. For accesses made by the executing application program, the method determines a target thread and maps determined target threads to either a remote node or a local node corresponding to a remote memory access and to a local memory access, respectively. Also disclosed is a computer-readable storage medium that stores a program of executable instructions that implements the method, and a data processing system. The invention can be implemented using a language such as Unified Parallel C (UPC) directed to a partitioned global address space (PGAS) paradigm.

    摘要翻译: 计算机实现的方法在执行应用程序期间分析共享存储器访问。 该方法包括对应用程序中的共享存储器访问的事件进行测试,其中应用程序将在具有p个节点的目标配置上执行; 使用p1处理节点执行应用程序,其中p1小于p并满足约束。 对于由执行的应用程序进行的访问,该方法确定目标线程并且将确定的目标线程分别映射到对应于远程存储器访问的远程节点或本地节点和本地存储器访问。 还公开了存储实现该方法的可执行指令的程序的计算机可读存储介质和数据处理系统。 本发明可以使用诸如针对分区全局地址空间(PGAS)范例的统一并行C(UPC)的语言来实现。

    FAST PREDICTION OF SHARED MEMORY ACCESS PATTERN
    9.
    发明申请
    FAST PREDICTION OF SHARED MEMORY ACCESS PATTERN 有权
    快速预测共享存储器访问模式

    公开(公告)号:US20130238862A1

    公开(公告)日:2013-09-12

    申请号:US13416331

    申请日:2012-03-09

    IPC分类号: G06F12/00

    摘要: A computer implemented method analyzes shared memory accesses during execution of an application program. The method includes instrumenting events of shared memory accesses in the application program, where the application program is to be executed on a target configuration having p nodes; executing the application program using p1 processing nodes, where p1 is less than p and satisfies a constraint. For accesses made by the executing application program, the method determines a target thread and maps determined target threads to either a remote node or a local node corresponding to a remote memory access and to a local memory access, respectively. Also disclosed is a computer-readable storage medium that stores a program of executable instructions that implements the method, and a data processing system. The invention can be implemented using a language such as Unified Parallel C (UPC) directed to a partitioned global address space (PGAS) paradigm.

    摘要翻译: 计算机实现的方法在执行应用程序期间分析共享存储器访问。 该方法包括对应用程序中的共享存储器访问的事件进行测试,其中应用程序将在具有p个节点的目标配置上执行; 使用p1处理节点执行应用程序,其中p1小于p并满足约束。 对于由执行的应用程序进行的访问,该方法确定目标线程并且将确定的目标线程分别映射到对应于远程存储器访问的远程节点或本地节点和本地存储器访问。 还公开了存储实现该方法的可执行指令的程序的计算机可读存储介质和数据处理系统。 本发明可以使用诸如针对分区全局地址空间(PGAS)范例的统一并行C(UPC)的语言来实现。

    Programmable framework for automatic tuning of software applications
    10.
    发明授权
    Programmable framework for automatic tuning of software applications 有权
    用于自动调整软件应用程序的可编程框架

    公开(公告)号:US08327325B2

    公开(公告)日:2012-12-04

    申请号:US12353433

    申请日:2009-01-14

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443

    摘要: A target application is automatically tuned. A list of solutions for identified performance bottlenecks in a target application is retrieved from a storage device. A plurality of modules is executed to compute specific parameters for solutions contained in the list of solutions. A list of modification commands associated with specific parameters computed by the plurality of modules is generated. The list of modification commands associated with the specific parameters is appended to a command sequence list. The list of modification commands is implemented in the target application. Specific source code regions corresponding to the identified performance bottlenecks in the target application are automatically tuned using the implemented list of modification commands. Then, the tuned target application is stored in the storage device.

    摘要翻译: 目标应用程序会自动调整。 从存储设备中检索目标应用程序中识别的性能瓶颈的解决方案列表。 执行多个模块以计算解决方案列表中包含的解的特定参数。 生成与由多个模块计算的特定参数相关联的修改命令的列表。 与特定参数相关联的修改命令的列表将附加到命令序列列表。 修改命令的列表在目标应用程序中实现。 使用实现的修改命令列表来自动调整与目标应用程序中识别的性能瓶颈对应的特定源代码区域。 然后,调谐目标应用程序存储在存储设备中。