Store-to-load forwarding mechanism for processor runahead mode operation
    1.
    发明授权
    Store-to-load forwarding mechanism for processor runahead mode operation 失效
    存储到负载转发机制,用于处理器跑头模式操作

    公开(公告)号:US08639886B2

    公开(公告)日:2014-01-28

    申请号:US12364984

    申请日:2009-02-03

    CPC classification number: G06F12/0875 G06F9/3826 G06F9/3834 G06F9/3857

    Abstract: A system and method to optimize runahead operation for a processor without use of a separate explicit runahead cache structure. Rather than simply dropping store instructions in a processor runahead mode, store instructions write their results in an existing processor store queue, although store instructions are not allowed to update processor caches and system memory. Use of the store queue during runahead mode to hold store instruction results allows more recent runahead load instructions to search retired store queue entries in the store queue for matching addresses to utilize data from the retired, but still searchable, store instructions. Retired store instructions could be either runahead store instructions retired, or retired store instructions that executed before entering runahead mode.

    Abstract translation: 一种用于在不使用单独的显式跑道缓存结构的情况下优化处理器的跑步头操作的系统和方法。 尽管存储指令不允许更新处理器缓存和系统存储器,但存储指令将其结果写入现有的处理器存储队列中,而不是简单地将存储指令放在处理器跑头模式中。 在跑步模式期间使用存储队列来保存存储指令结果允许更多的最新跑步加载指令来搜索存储队列中的退出存储队列条目以匹配地址以利用来自已退休但仍可搜索的存储指令的数据。 退休存储指令可以是退出存储指令退出,或退出存储指令,在进入排头模式之前执行。

    DATA REORGANIZATION IN NON-UNIFORM CACHE ACCESS CACHES
    2.
    发明申请
    DATA REORGANIZATION IN NON-UNIFORM CACHE ACCESS CACHES 有权
    非均匀缓存访问缓存中的数据重组

    公开(公告)号:US20100274973A1

    公开(公告)日:2010-10-28

    申请号:US12429754

    申请日:2009-04-24

    CPC classification number: G06F12/0846 G06F12/0811

    Abstract: Embodiments that dynamically reorganize data of cache lines in non-uniform cache access (NUCA) caches are contemplated. Various embodiments comprise a computing device, having one or more processors coupled with one or more NUCA cache elements. The NUCA cache elements may comprise one or more banks of cache memory, wherein ways of the cache are horizontally distributed across multiple banks. To improve access latency of the data by the processors, the computing devices may dynamically propagate cache lines into banks closer to the processors using the cache lines. To accomplish such dynamic reorganization, embodiments may maintain “direction” bits for cache lines. The direction bits may indicate to which processor the data should be moved. Further, embodiments may use the direction bits to make cache line movement decisions.

    Abstract translation: 预期在非均匀缓存访问(NUCA)高速缓存中动态地重组高速缓存线的数据的实施例。 各种实施例包括具有与一个或多个NUCA高速缓存元件耦合的一个或多个处理器的计算设备。 NUCA高速缓存元件可以包括一个或多个高速缓冲存储器组,其中高速缓存的方式在多个存储体之间水平分布。 为了改善处理器对数据的访问等待时间,计算设备可以使用高速缓存行来将缓存线路动态地传播到更靠近处理器的存储体中。 为了实现这种动态重组,实施例可以保持高速缓存行的“方向”位。 方向位可以指示哪个处理器应该移动数据。 此外,实施例可以使用方向位来进行高速缓存行移动决定。

    Architectural level throughput based power modeling methodology and apparatus for pervasively clock-gated processor cores
    6.
    发明申请
    Architectural level throughput based power modeling methodology and apparatus for pervasively clock-gated processor cores 有权
    基于建筑级吞吐量的功率建模方法和设备,用于普及时钟门控处理器内核

    公开(公告)号:US20060080625A1

    公开(公告)日:2006-04-13

    申请号:US10960730

    申请日:2004-10-07

    CPC classification number: G06F17/5022 G06F2217/78

    Abstract: A method, system, and apparatus for estimating the power dissipated by a processor core processing a workload, where the method includes analyzing a reference test case to generate a reference workload characteristic. Analyzing an actual workload to generate an actual workload characteristic. Performing a power analysis for the reference test case to establish a reference power dissipation value. Estimating an actual workload power dissipation value responsive to the actual and reference workload characteristics and the reference power dissipation value

    Abstract translation: 一种用于估计由处理器核心处理工作负载消耗的功率的方法,系统和装置,其中所述方法包括分析参考测试用例以生成参考工作负载特性。 分析实际工作负载以生成实际工作负载特性。 对参考测试用例进行功率分析,建立参考功耗值。 根据实际和参考工作负载特性以及参考功耗值估算实际工作负载功耗值

    Performance monitor design for counting events generated by thread groups
    7.
    发明授权
    Performance monitor design for counting events generated by thread groups 有权
    线程组生成的计数事件的性能监视器设计

    公开(公告)号:US08589922B2

    公开(公告)日:2013-11-19

    申请号:US12900992

    申请日:2010-10-08

    CPC classification number: G06F9/45558 G06F2009/45591

    Abstract: A number of hypervisor register fields are set to specify which processor cores are allowed to generate a number of performance events for a particular thread group. A plurality of threads for an application running in the computing environment to a plurality of thread groups are configured by a plurality of thread group fields in a plurality of control registers. A number of counter sets are allowed to count a number of thread group events originating from one of a shared resource and a shared cache are specified by a number of additional hypervisor register fields.

    Abstract translation: 设置了一些管理程序注册字段,以指定允许哪些处理器核心为特定线程组生成多个性能事件。 用于在计算环境中运行到多个线程组的应用的多个线程由多个控制寄存器中的多个线程组字段配置。 允许多个计数器组对由共享资源中的一个发起的线程组事件的数量进行计数,并且共享高速缓存由多个额外的管理程序寄存器字段指定。

    Sharing sampled instruction address registers for efficient instruction sampling in massively multithreaded processors
    8.
    发明授权
    Sharing sampled instruction address registers for efficient instruction sampling in massively multithreaded processors 失效
    共享采样指令地址寄存器,用于大规模多线程处理器中的高效指令采样

    公开(公告)号:US08489787B2

    公开(公告)日:2013-07-16

    申请号:US12902491

    申请日:2010-10-12

    CPC classification number: G06F11/3404 G06F11/3466 G06F2201/865 G06F2201/88

    Abstract: Sampled instruction address registers are shared among multiple threads executing on a plurality of processor cores. Each of a plurality of sampled instruction address registers are assigned to a particular thread running for an application on the plurality of processor cores. Each of the sampled instruction address registers are configured by storing in each of the sampled instruction address registers a thread identification of the particular thread in a thread identification field and a processor identification of a particular processor on which the particular thread is running in a processor identification field.

    Abstract translation: 采样指令地址寄存器在多个处理器核心上执行的多个线程之间共享。 将多个采样指令地址寄存器中的每一个分配给在多个处理器核上为应用程序运行的特定线程。 每个采样指令地址寄存器通过在每个采​​样指令地址寄存器中存储线程标识字段中的特定线程的线程标识和特定线程在其上运行的处理器标识中的处理器标识来配置 领域。

    Effective prefetching with multiple processors and threads
    9.
    发明授权
    Effective prefetching with multiple processors and threads 失效
    有效的预取与多个处理器和线程

    公开(公告)号:US08200905B2

    公开(公告)日:2012-06-12

    申请号:US12192072

    申请日:2008-08-14

    CPC classification number: G06F12/0862 G06F12/0831 G06F2212/6026

    Abstract: A processing system includes a memory and a first core configured to process applications. The first core includes a first cache. The processing system includes a mechanism configured to capture a sequence of addresses of the application that miss the first cache in the first core and to place the sequence of addresses in a storage array; and a second core configured to process at least one software algorithm. The at least one software algorithm utilizes the sequence of addresses from the storage array to generate a sequence of prefetch addresses. The second core issues prefetch requests for the sequence of the prefetch addresses to the memory to obtain prefetched data and the prefetched data is provided to the first core if requested.

    Abstract translation: 处理系统包括被配置为处理应用的存储器和第一核心。 第一个核心包括第一个缓存。 处理系统包括被配置为捕获错过第一核心中的第一高速缓存的应用程序的地址序列并将地址序列放置在存储阵列中的机制; 以及被配置为处理至少一个软件算法的第二核心。 所述至少一个软件算法利用来自存储阵列的地址序列来生成预取地址序列。 第二个核心将预取地址序列的预取请求发送到存储器以获得预取数据,并且如果请求,则将预取数据提供给第一核。

    Power Management for Systems On a Chip
    10.
    发明申请
    Power Management for Systems On a Chip 有权
    电源管理系统芯片

    公开(公告)号:US20110191603A1

    公开(公告)日:2011-08-04

    申请号:US12700513

    申请日:2010-02-04

    CPC classification number: G06F1/00 Y02D10/124

    Abstract: A system for controlling a multitasking microprocessor system includes an interconnect, a plurality of processing units connected to the interconnect forming a single-source, single-sink flow network, wherein the plurality of processing units pass data between one another from the single-source to the single-sink, and a monitor connected to the interconnect for monitoring a portion of a resource consumed by each of the plurality of processing units and for controlling the plurality of processing units according to a predetermined budget for the resource to control a data overflow condition, wherein the monitor controls performance and power modes of the plurality of processing units.

    Abstract translation: 用于控制多任务微处理器系统的系统包括互连,连接到形成单源单一信宿流网络的互连的多个处理单元,其中所述多个处理单元将数据从单一源传递到 所述单个接收器和连接到所述互连的监视器,用于监视所述多个处理单元中的每一个所消耗的资源的一部分,并且用于根据所述资源的预定预算控制所述多个处理单元以控制数据溢出条件 ,其中所述监视器控制所述多个处理单元的性能和功率模式。

Patent Agency Ranking