Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher
    45.
    发明申请
    Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher 审中-公开
    在硬件预取器中利用意外错误地址的否定反馈

    公开(公告)号:US20130185515A1

    公开(公告)日:2013-07-18

    申请号:US13350909

    申请日:2012-01-16

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0862 G06F2212/6026

    摘要: Systems and methods for populating a cache using a hardware prefetcher are disclosed. A method for prefetching cache entries includes determining an initial stride value based on at least a first and second demand miss address in the cache, verifying the initial stride value based on a third demand miss address in the cache, prefetching a predetermined number of cache entries based on the verified initial stride value, determining an expected next miss address in the cache based on the verified initial stride value and addresses of the prefetched cache entries; and confirming the verified initial stride value based on comparing the expected next miss address to a next demand miss address in the cache. If the verified initial stride value is confirmed, additional cache entries are prefetched. If the verified initial stride value is not confirmed, further prefetching is stalled and an alternate stride value is determined.

    摘要翻译: 公开了使用硬件预取器填充高速缓存的系统和方法。 用于预取高速缓存条目的方法包括基于高速缓存中的至少第一和第二请求未命中地址来确定初始步幅值,基于高速缓存中的第三请求未命中地址来验证初始步幅值,预取数量的高速缓存条目 基于所验证的初始步幅值,基于所述经验证的初始步幅值和所述预取高速缓存条目的地址来确定所述高速缓存中的预期下一未命中地址; 并且基于将预期的下一个未命中地址与高速缓存中的下一个请求未命中地址进行比较来确认已验证的初始步幅值。 如果确认了验证的初始步幅值,则预取额外的高速缓存条目。 如果验证的初始步幅值未被确认,则进一步预取停止并且确定替代步幅值。

    Thread allocation and clock cycle adjustment in an interleaved multi-threaded processor
    46.
    发明授权
    Thread allocation and clock cycle adjustment in an interleaved multi-threaded processor 有权
    交错多线程处理器中的线程分配和时钟周期调整

    公开(公告)号:US08397238B2

    公开(公告)日:2013-03-12

    申请号:US12632873

    申请日:2009-12-08

    IPC分类号: G06F9/46

    摘要: Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-threaded processor. In a particular embodiment, a method allocates software threads to hardware threads. A number of software threads to be allocated is identified. It is determined when the number of software threads is less than a number of hardware threads. When the number of software threads is less than the number of hardware threads, at least two of the software threads are allocated to non-sequential hardware threads. A clock signal to be applied to the hardware threads is adjusted responsive to the non-sequential hardware threads allocated.

    摘要翻译: 公开了用于通过减少多线程处理器中的硬件线程切换来降低功率的方法,装置和计算机可读存储介质。 在特定实施例中,一种方法将软件线程分配给硬件线程。 识别要分配的多个软件线程。 何时软件线程的数量少于多个硬件线程。 当软件线程的数量小于硬件线程数时,至少两个软件线程被分配给非顺序硬件线程。 响应于所分配的非顺序硬件线程来调整应用于硬件线程的时钟信号。

    Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching
    47.
    发明申请
    Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching 审中-公开
    使用循环和寻址模式指令集语义来直接硬件预取

    公开(公告)号:US20130185516A1

    公开(公告)日:2013-07-18

    申请号:US13350914

    申请日:2012-01-16

    IPC分类号: G06F12/12

    摘要: Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an auto-increment-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.

    摘要翻译: 将高速缓存线预取到耦合到处理器的高速缓存中的系统和方法。 硬件预取器被配置为将存储器访问指令识别为自动递增地址(AIA)存储器访问指令,从AIA指令的增量字段推断步幅值,并且基于步幅值将预取行预取到高速缓存中。 另外或替代地,硬件预取器被配置为识别预取的高速缓存行是硬件循环的一部分,确定硬件循环的最大循环计数,以及剩余循环计数作为最大循环计数和循环数之间的差 已经完成的迭代,当剩余循环数小于选定数量的缓存时,选择要预取的高速缓存行数,并将实际数量的缓存行预截取为小于或等于剩余循环计数 线条。

    Hybrid Write-Through/Write-Back Cache Policy Managers, and Related Systems and Methods
    48.
    发明申请
    Hybrid Write-Through/Write-Back Cache Policy Managers, and Related Systems and Methods 有权
    混合写/高速缓存策略管理器,以及相关系统和方法

    公开(公告)号:US20130185511A1

    公开(公告)日:2013-07-18

    申请号:US13470643

    申请日:2012-05-14

    IPC分类号: G06F12/08

    摘要: Embodiments disclosed in the detailed description include hybrid write-through/write-back cache policy managers, and related systems and methods. A cache write policy manager is configured to determine whether at least two caches among a plurality of parallel caches are active. If all of one or more other caches are not active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-hack cache policy. In this manner, the cache write policy manager may conserve power and/or increase performance of a singly active processor core. If any of the one or more other caches are active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-through cache policy. In this manner, the cache write policy manager facilitates data coherency among the parallel caches when multiple processor cores are active.

    摘要翻译: 在详细描述中公开的实施例包括混合写入/回写高速缓存策略管理器以及相关的系统和方法。 高速缓存写策略管理器被配置为确定多个并行高速缓存中的至少两个高速缓存是否是活动的。 如果所有一个或多个其他高速缓存都不活动,则缓存写策略管理器被配置为指示并行高速缓存中的活动高速缓存来应用写入黑客缓存策略。 以这种方式,缓存写入策略管理器可以节省单个活动处理器核心的功率和/或提高性能。 如果一个或多个其他高速缓存中的任一个是活动的,则高速缓存写策略管理器被配置为指示并行高速缓存中的活动高速缓存来应用直写高速缓存策略。 以这种方式,当多个处理器核心处于活动状态时,缓存写入策略管理器便于并行高速缓存之间的数据一致性。

    System and method of selectively accessing a register file
    49.
    发明授权
    System and method of selectively accessing a register file 有权
    选择性地访问寄存器文件的系统和方法

    公开(公告)号:US07979681B2

    公开(公告)日:2011-07-12

    申请号:US11943190

    申请日:2007-11-20

    IPC分类号: G06F9/00

    摘要: In a particular embodiment, a method is disclosed that includes identifying a first block of bits within a result to be written to a destination register by an execution unit. The result includes a plurality of bits having the first block of bits and a second block of bits. The first block of bits has a value of zero. The method further includes providing an encoded bit value representing the first block of bits to a control register and selectively writing the second block of bits, but not the first block of bits, to the destination register. The destination register is sized to receive the first and second blocks of bits.

    摘要翻译: 在特定实施例中,公开了一种方法,其包括通过执行单元识别要写入到目的地寄存器的结果内的第一比特块。 结果包括具有第一比特块和第二比特块的多个比特。 第一个位块的值为零。 该方法还包括向控制寄存器提供表示第一比特位的编码比特值,并将第二比特块而不是第一比特块写入目的寄存器。 目的地寄存器的大小适于接收第一和第二位块。

    System and Method of Data Forwarding Within An Execution Unit
    50.
    发明申请
    System and Method of Data Forwarding Within An Execution Unit 有权
    执行单元内数据转发的系统和方法

    公开(公告)号:US20090216993A1

    公开(公告)日:2009-08-27

    申请号:US12037300

    申请日:2008-02-26

    IPC分类号: G06F9/312 G06F12/10

    摘要: In an embodiment, a method is disclosed that includes, comparing, during a write back stage at an execution unit, a write identifier associated with a result to be written to a register file from execution of a first instruction to a read identifier associated with a second instruction at an execution pipeline within an interleaved multi-threaded (IMT) processor having multiple execution units. When the write identifier matches the read identifier, the method further includes storing the result at a local memory of the execution unit for use by the execution unit in the subsequent read stage.

    摘要翻译: 在一个实施例中,公开了一种方法,其包括:在执行单元的回写阶段期间,将与将被写入寄存器文件的结果相关联的写入标识符从执行第一指令到与第一指令相关联的读取标识符进行比较 在具有多个执行单元的交错多线程(IMT)处理器内的执行流水线处的第二指令。 当写入标识符与读取标识符匹配时,该方法还包括将结果存储在执行单元的本地存储器中,供执行单元在随后的读取阶段中使用。