Workload performance projection for future information handling systems using microarchitecture dependent data
    1.
    发明授权
    Workload performance projection for future information handling systems using microarchitecture dependent data 有权
    使用微架构依赖数据的未来信息处理系统的工作负载性能预测

    公开(公告)号:US09135142B2

    公开(公告)日:2015-09-15

    申请号:US12343482

    申请日:2008-12-24

    摘要: A performance projection system includes a test IHS and a currently existing IHS. The performance projection system includes surrogate programs and user application software. The test IHS employs a memory that includes a virtual future IHS, currently existing IHS, surrogate programs, and user application software for determination of runtime and HW counter performance data. The user application software and surrogate programs execute on the currently existing MS to provide designers with runtime data and HW counter or microarchitecture dependent data. Designers execute surrogate programs on the future IHS to provide runtime and HW counter data. Designers normalize and weight the runtime and HW counter data to provide a representative surrogate program for comparison to user application software performance on the future IHS. Using a scaling factor, designers may generate a projection of runtime performance for the user application software executing on the future IHS.

    摘要翻译: 性能投影系统包括测试IHS和当前存在的IHS。 性能投影系统包括代理程序和用户应用软件。 测试IHS采用包含虚拟未来IHS,现有IHS,替代程序和用户应用软件的存储器,用于确定运行时和硬件计数器性能数据。 用户应用软件和代理程序在当前现有的MS上执行,为设计人员提供运行时数据和HW计数器或微体系结构依赖数据。 设计人员在未来的IHS上执行代理程序来提供运行时和硬件计数器数据。 设计师对运行时和HW计数器数据进行规范化和加权,以提供代表性的代理程序,以便与未来IHS的用户应用软件性能进行比较。 使用缩放因子,设计人员可以为未来IHS上执行的用户应用软件生成运行时性能的投影。

    Ineffective prefetch determination and latency optimization
    2.
    发明授权
    Ineffective prefetch determination and latency optimization 有权
    无效的预取确定和延迟优化

    公开(公告)号:US08949579B2

    公开(公告)日:2015-02-03

    申请号:US12897008

    申请日:2010-10-04

    IPC分类号: G06F9/38 G06F12/08

    摘要: A processor of an information handling system (IHS) initiates an L3 cache prefetch operation in response to a demand load during instruction processing. The processor selects an L3 cache prefetch at random for tracking as a target prefetched instruction. The processor initiates an L1 cache target prefetch operation and stores the resultant target prefetched instruction in the L1 cache. If a demand load arrives, the processor analyzes the target prefetched instruction for effectiveness and determines the source of the prefetch data. If a demand does not arrive, the processor tests to determine if the particular prefetched instruction timed out in the cache and identifies the ineffectiveness of the prefetch operation. The processor samples multiple prefetch operations at random and generates a history of prefetch effectiveness and other useful prefetch information. The processor stores the prefetch effectiveness information to enable reduction or removal of ineffective prefetch operations.

    摘要翻译: 信息处理系统(IHS)的处理器在指令处理期间响应于需求负载启动L3高速缓存预取操作。 处理器随机选择L3高速缓存预取作为目标预取指令进行跟踪。 处理器发起L1高速缓存目标预取操作,并将所得到的目标预取指令存储在L1高速缓存中。 如果需求负载到达,则处理器分析目标预取指令的有效性并确定预取数据的来源。 如果请求没有到达,则处理器测试以确定特定预取指令是否在高速缓存中超时并且识别预取操作的无效。 处理器随机抽取多个预取操作,并生成预取有效性和其他有用的预取信息的历史记录。 处理器存储预取有效性信息以便能够减少或去除无效的预取操作。

    DETERMINING EACH STALL REASON FOR EACH STALLED INSTRUCTION WITHIN A GROUP OF INSTRUCTIONS DURING A PIPELINE STALL
    4.
    发明申请
    DETERMINING EACH STALL REASON FOR EACH STALLED INSTRUCTION WITHIN A GROUP OF INSTRUCTIONS DURING A PIPELINE STALL 失效
    确定管道中的一组指令中的每个停留指令的每一个原因

    公开(公告)号:US20120278595A1

    公开(公告)日:2012-11-01

    申请号:US13097284

    申请日:2011-04-29

    IPC分类号: G06F9/38

    摘要: During a pipeline stall in an out of order processor, until a next to complete instruction group completes, a monitoring unit receives, from a completion unit of a processor, a next to finish indicator indicating the finish of an oldest previously unfinished instruction from among a plurality of instructions of a next to complete instruction group. The monitoring unit receives, from a plurality of functional units of the processor, a plurality of finish reports including completion reasons for a plurality of separate instructions. The monitoring unit determines at least one stall reason from among multiple stall reasons for the oldest instruction from a selection of completion reasons from a selection of finish reports aligned with the next to finish indicator from among the plurality of finish reports. Once the monitoring unit receives a complete indicator from the completion unit, indicating the completion of the next to complete instruction group, the monitoring unit stores each determined stall reason aligned with each next to finish indicator in memory.

    摘要翻译: 在处理器处于不规则处理器的流水线停止期间,直到完成指令组的下一个完成为止,监视单元从处理器的完成单元接收到指示完成以前未完成的指令的完成的下一个完成指示, 下一个完成指令组的多个指令。 监视单元从处理器的多个功能单元接收多个完成报告,包括多个单独指令的完成原因。 从多个完成报告中的与下一个完成指示符对齐的完成报告的选择完成原因的选择中,监视单元从最多的指令的多个失败原因中确定至少一个失败原因。 一旦监视单元从完成单元接收到完整的指示符,指示完成下一个完成指令组,则监视单元将每个确定的停顿原因与每个下一个完成指示符对准在存储器中。

    SYSTEM AND METHOD FOR EXECUTION BASED FILTERING OF INSTRUCTIONS OF A PROCESSOR TO MANAGE DYNAMIC CODE OPTIMIZATION
    5.
    发明申请
    SYSTEM AND METHOD FOR EXECUTION BASED FILTERING OF INSTRUCTIONS OF A PROCESSOR TO MANAGE DYNAMIC CODE OPTIMIZATION 审中-公开
    用于处理器管理动态代码优化的执行指令的执行系统和方法

    公开(公告)号:US20120084537A1

    公开(公告)日:2012-04-05

    申请号:US12894762

    申请日:2010-09-30

    IPC分类号: G06F9/30

    摘要: A filter executing on a processor monitors instructions executing on the processor to identify instructions that will benefit from performance tuning. Filtering instructions before analysis for performance tuning reduces overhead by identifying candidates for performance tuning with low cost monitoring before expending resources on analysis so that only instructions that will have performance tuning are analyzed. Reducing overhead for performance tuning makes performance tuning practical in a dynamic optimization environment in which instructions and their effective addresses change over time.

    摘要翻译: 在处理器上执行的过滤器监视在处理器上执行的指令,以识别将从性能调优中受益的指令。 在性能调整分析之前,过滤指令可以在分析资源之前,通过低成本监控来识别性能调整的候选项,从而降低开销,从而只分析具有性能调整的指令。 降低性能调整的开销使得性能调整在动态优化环境中可行,其中指令及其有效地址随时间变化。

    Method And Apparatus For Integrated Circuit Design Model Performance Evaluation Using Basic Block Vector Clustering And Fly-By Vector Clustering
    6.
    发明申请
    Method And Apparatus For Integrated Circuit Design Model Performance Evaluation Using Basic Block Vector Clustering And Fly-By Vector Clustering 有权
    使用基本块矢量聚类和飞行矢量聚类的集成电路设计模型性能评估的方法和装置

    公开(公告)号:US20090276191A1

    公开(公告)日:2009-11-05

    申请号:US12112035

    申请日:2008-04-30

    IPC分类号: G06F17/50

    摘要: A test system or simulator includes an enhanced IC test application sampling software program that executes test application software on a semiconductor die IC design model. The enhanced test application sampling software may include trace, simulation point, CPI error, clustering, instruction budgeting, and other programs. The enhanced test application sampling software generates basic block vectors (BBVs) and fly-by vectors (FBVs) from instruction trace analysis of test application software workloads. The enhanced test application sampling software utilizes the microarchitecture dependent information to generate the FBVs to select representative instruction intervals from the test application software. The enhanced test application sampling software generates a reduced representative test application software program from the BBV and FBV data utilizing a global instruction budgeting analysis method. Designers use the test system with enhanced test application sampling software to evaluate IC design models by using the representative test application software program.

    摘要翻译: 测试系统或模拟器包括在半导体芯片IC设计模型上执行测试应用软件的增强型IC测试应用采样软件程序。 增强的测试应用程序采样软件可能包括跟踪,模拟点,CPI错误,聚类,指令预算和其他程序。 增强的测试应用采样软件从测试应用软件工作负载的指令跟踪分析中生成基本块向量(BBV)和飞越向量(FBV)。 增强的测试应用采样软件利用微架构依赖信息生成FBV,以从测试应用软件中选择代表性指令间隔。 增强的测试应用采样软件利用全球指令预算分析方法,从BBV和FBV数据生成代表性测试应用软件程序。 设计人员使用带有增强型测试应用程序采样软件的测试系统,通过使用代表性的测试应用软件程序来评估IC设计模型。

    Prioritizing instructions based on the number of delay cycles
    7.
    发明授权
    Prioritizing instructions based on the number of delay cycles 有权
    基于延迟周期数的优先级指令

    公开(公告)号:US09405548B2

    公开(公告)日:2016-08-02

    申请号:US13314052

    申请日:2011-12-07

    IPC分类号: G06F9/38

    摘要: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.

    摘要翻译: 方法,系统和计算机程序产品可以在数据处理系统中提供延迟识别。 一种装置可以包括具有延迟计数器,阈值寄存器,延迟寄存器和延迟检测器的延迟识别单元。 延迟检测器可以被配置为响应于检测到一组指令被延迟而启动延迟计数器,并且响应于检测到一组指令不再被延迟而停止延迟计数器。 延迟检测器可以另外被配置为将由延迟计数器计数的周期数与阈值寄存器中的阈值数量进行比较,并且当数字的数量存储至少一个指令的一个指令的有效地址时, 由延迟计数器计数的周期大于存储在阈值寄存器中的阈值周期数。

    DELAY IDENTIFICATION IN DATA PROCESSING SYSTEMS
    8.
    发明申请
    DELAY IDENTIFICATION IN DATA PROCESSING SYSTEMS 有权
    数据处理系统中的延迟识别

    公开(公告)号:US20130151816A1

    公开(公告)日:2013-06-13

    申请号:US13314052

    申请日:2011-12-07

    IPC分类号: G06F9/30 G06F9/312

    摘要: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.

    摘要翻译: 方法,系统和计算机程序产品可以在数据处理系统中提供延迟识别。 一种装置可以包括具有延迟计数器,阈值寄存器,延迟寄存器和延迟检测器的延迟识别单元。 延迟检测器可以被配置为响应于检测到一组指令被延迟而启动延迟计数器,并且响应于检测到一组指令不再被延迟而停止延迟计数器。 延迟检测器可以另外被配置为将由延迟计数器计数的周期数与阈值寄存器中的阈值数量进行比较,并且当数字的数量存储至少一个指令的一个指令的有效地址时, 由延迟计数器计数的周期大于存储在阈值寄存器中的阈值周期数。

    ASYNCHRONOUS ASSIST THREAD INITIATION

    公开(公告)号:US20120204011A1

    公开(公告)日:2012-08-09

    申请号:US13447961

    申请日:2012-04-16

    IPC分类号: G06F9/38

    摘要: A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.

    ASYNCHRONOUS ASSIST THREAD INITIATION
    10.
    发明申请
    ASYNCHRONOUS ASSIST THREAD INITIATION 有权
    异步辅助螺纹启动

    公开(公告)号:US20120036339A1

    公开(公告)日:2012-02-09

    申请号:US12849903

    申请日:2010-08-04

    IPC分类号: G06F9/30

    摘要: A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.

    摘要翻译: 一种数据处理方法包括执行程序的控制线程的数据处理系统的处理器,并且在执行程序的控制线程期间检测特定的异步事件的发生。 响应于在执行程序的控制线程期间发生特定的异步事件,处理器启动程序的辅助线程的执行,使得处理器同时执行辅助线程并控制程序的线程。