Data stream prefetching in a microprocessor
    2.
    发明申请
    Data stream prefetching in a microprocessor 失效
    数据流在微处理器中预取

    公开(公告)号:US20060179239A1

    公开(公告)日:2006-08-10

    申请号:US11054889

    申请日:2005-02-10

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0862 G06F2212/6028

    摘要: A method of prefetching data in a microprocessor includes identifying a data stream associated with a process and determining a depth associated with the data stream based upon prefetch factors including the number of currently concurrent data streams and data consumption rates associated with the concurrent data streams. Data prefetch requests are allocated with the data stream to reflect the determined depth of the data stream. Allocating data prefetch requests may include allocating prefetch requests for a number of cache lines away from the cache line currently being referenced, wherein the number of cache lines is equal to the determined depth. The method may include, responsive to determining the depth associated with a data stream, configuring prefetch hardware to reflect the determined depth for the identified data stream. Prefetch control bits in an instruction executed by the processor control the prefetch hardware configuration.

    摘要翻译: 在微处理器中预取数据的方法包括基于包括当前并发数据流的数量和与并发数据流相关联的数据消耗速率的预取因子来识别与进程相关联的数据流并确定与数据流相关联的深度。 数据预取请求被分配与数据流以反映确定的数据流的深度。 分配数据预取请求可以包括为当前被引用的高速缓存行分配多个高速缓存行的预取请求,其中高速缓存行的数量等于所确定的深度。 该方法可以响应于确定与数据流相关联的深度,配置预取硬件以反映所识别的数据流的确定的深度。 由处理器执行的指令中的预取控制位控制预取硬件配置。

    Store stream prefetching in a microprocessor
    3.
    发明申请
    Store stream prefetching in a microprocessor 失效
    在微处理器中存储流预取

    公开(公告)号:US20060179238A1

    公开(公告)日:2006-08-10

    申请号:US11054871

    申请日:2005-02-10

    IPC分类号: G06F13/28

    摘要: In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.

    摘要翻译: 在具有加载/存储单元和预取硬件的微处理器中,预取硬件包括预取队列,其包含指示分配的数据流的条目。 预取引擎接收与由加载/存储单元执行的存储指令相关联的地址。 预取引擎通过将队列中的条目与包含多个高速缓存块的地址的窗口进行比较来确定是否对与存储指令相对应的预取队列中的条目进行分配,其中地址窗口从接收到的地址导出。 预取引擎将预取队列中的条目与两个连续高速缓存块的窗口进行比较。 当预取队列中的任何条目都在地址窗口内时,预取引擎抑制新条目的分配。 当存储指令的数据地址等于地址窗口的边界区域中的地址时,预取引擎进一步抑制新条目的分配。

    DATA SHIFT CAPABILITY FOR SCANNABLE REGISTER
    5.
    发明申请
    DATA SHIFT CAPABILITY FOR SCANNABLE REGISTER 有权
    SCANNABLE寄存器的数据移位能力

    公开(公告)号:US20070240023A1

    公开(公告)日:2007-10-11

    申请号:US11278439

    申请日:2006-04-03

    IPC分类号: G01R31/28

    CPC分类号: G01R31/318541

    摘要: A circuit permits a user to present signals to control the flow of data from a first-type cell to a second-type cell. The circuit is susceptible to loading each cell individually, as well as loading cells by means of scanning input in a series through a low order cell to a higher order cell. The circuit may be copied as a series of cells wherein a bit held in each first-type cell is copied to the next higher second-type cell.

    摘要翻译: 电路允许用户呈现信号以控制从第一型电池到第二型电池的数据流。 电路容易单独加载每个单元,以及通过将低阶单元的串行扫描输入到高阶单元来加载单元。 电路可以被复制为一系列单元,其中保持在每个第一类型单元中的位复制到下一较高的第二类型单元。

    Branch encoding before instruction cache write
    6.
    发明申请
    Branch encoding before instruction cache write 有权
    指令缓存写入前的分支编码

    公开(公告)号:US20060174095A1

    公开(公告)日:2006-08-03

    申请号:US11050350

    申请日:2005-02-03

    IPC分类号: G06F9/44

    CPC分类号: G06F9/322 G06F9/382

    摘要: Method, system and computer program product for determining the targets of branches in a data processing system. A method for determining the target of a branch in a data processing system includes performing at least one pre-calculation relating to determining the target of the branch prior to Writing the branch into a Level 1 (L1) cache to provide a pre-decoded branch, and then writing the pre-decoded branch into the L1 cache. By pre-calculating matters relating to the targets of branches before the branches are written into the L1 cache, for example, by re-encoding relative branches as absolute branches, a reduction in branch redirect delay can be achieved, thus providing a substantial improvement in overall processor performance.

    摘要翻译: 用于确定数据处理系统中分支目标的方法,系统和计算机程序产品。 一种用于确定数据处理系统中的分支的目标的方法包括在将分支写入级别1(L1)高速缓存之前执行与确定分支的目标有关的至少一个预计算,以提供预解码分支 ,然后将预解码的分支写入L1高速缓存。 通过在将分支写入L1高速缓存之前预先计算与分支目标相关的事项,例如通过将相关分支重新编码为绝对分支,可以实现分支重定向延迟的减少,从而提供了显着的改进 整体处理器性能。

    Load Lookahead Prefetch for Microprocessors
    8.
    发明申请
    Load Lookahead Prefetch for Microprocessors 有权
    加载用于微处理器的前瞻预取

    公开(公告)号:US20080077776A1

    公开(公告)日:2008-03-27

    申请号:US11950495

    申请日:2007-12-05

    IPC分类号: G06F9/38

    摘要: The present invention allows a microprocessor to identify and speculatively execute future load instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The data for such future load instructions can be prefetched from a distant cache or main memory such that when the load instruction is re-executed (non speculative executed) after the stall condition expires, its data will reside either in the L1 cache, or will be enroute to the processor, resulting in a reduced execution latency. When an extended stall condition is detected, load lookahead prefetch is started allowing speculative execution of instructions that would normally have been stalled. In this speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available via forwarding and are not written to the architected registers. A set of status bits are used to dynamically keep track of the dependencies between instructions in the pipeline and a bit vector tracks invalid architected facilities with respect to the speculative instruction stream. Both sources of information are used to identify load instructions with valid operands for calculating the load address. If the operands are valid, then a load prefetch operation is started to retrieve data from the cache ahead of time such that it can be available for the load instruction when it is non-speculatively executed.

    摘要翻译: 本发明允许微处理器在失速状态期间识别并推测性地执行未来的加载指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 可以从远程高速缓存或主存储器预取这样的未来加载指令的数据,使得当停止条件到期之后,当加载指令被重新执行(不推测执行)时,其数据将驻留在L1高速缓存中,或者将 进入处理器,导致执行时间缩短。 当检测到扩展失速条件时,启动加载前瞻预取,允许推测执行通常已经停止的指令。 在这种推测模式中,由于缺少L1高速缓存的源负载,设备在推测执行模式下不可用的设备,或由于不能通过转发而不能使用并且未写入到架构化寄存器的推测性指令结果,指令操作数可能无效。 一组状态位用于动态地跟踪流水线中的指令之间的依赖关系,并且位向量相对于推测性指令流跟踪无效的架构设施。 两个信息来源用于识别加载指令,其中包含用于计算加载地址的有效操作数。 如果操作数有效,则启动加载预取操作以提前从高速缓存中检索数据,使得当非推测性地执行加载指令时,可以对加载指令可用。

    Method of implementing precise, localized hardware-error workarounds under centralized control
    9.
    发明申请
    Method of implementing precise, localized hardware-error workarounds under centralized control 审中-公开
    在集中控制下实施精确的本地化硬件错误解决方法的方法

    公开(公告)号:US20060184770A1

    公开(公告)日:2006-08-17

    申请号:US11056878

    申请日:2005-02-12

    IPC分类号: G06F9/40

    CPC分类号: G06F11/0793 G06F11/0721

    摘要: In a processor, a localized workaround is activated upon the sensing of a problematic condition occurring on said processor, and then control of the deactivation of the localized workaround is superseded by a centralized controller. In a preferred embodiment, the centralized controller monitors forward progress of the processor and maintains the workaround in an active condition until a threshold level of forward progress has occurred. Optionally, the localized workaround may be re-activated while under centralized control, resetting the notion of forward progress. Using the present invention, localized workarounds perform effectively while having a minimal impact on processor performance.

    摘要翻译: 在处理器中,当感测到在所述处理器上发生的有问题的状况时激活本地化解决方案,然后由集中式控制器取代对本地化解决方案的停用的控制。 在优选实施例中,集中控制器监视处理器的前进进程,并将解决方案维持在活动状态,直到发生了前进进程的阈值级别。 可选地,本地化的解决方法可以在集中控制的情况下被重新激活,重置前进进程的概念。 使用本发明,本地化的解决方法在对处理器性能具有最小影响的同时有效地执行。

    Method using hazard vector to enhance issue throughput of dependent instructions in a microprocessor
    10.
    发明申请
    Method using hazard vector to enhance issue throughput of dependent instructions in a microprocessor 失效
    使用危险向量的方法来增强微处理器中依赖指令的问题吞吐量

    公开(公告)号:US20060179282A1

    公开(公告)日:2006-08-10

    申请号:US11054289

    申请日:2005-02-09

    IPC分类号: G06F9/30

    摘要: A method and related apparatus is provided for a processor having a number of registers, wherein instructions are sequentially issued to move through a sequence of execution stages, from an initial stage to a final write back stage. As a method, an embodiment includes the step of issuing a first instruction, such as an FMA instruction, to move through the sequence of execution stages, the first instruction being directed to a specified one of the registers. The method further includes issuing a second instruction to move through the execution stages, the second instruction being issued after the first instruction has issued, but before the first instruction reaches the final write back stage. The second instruction is likewise directed to the specified register, and comprises either a store instruction or a load instruction, selectively. R and W bits corresponding to the specified register are used to ensure that a store instruction does not read data from, and that a load instruction does not write data to the specified register, respectively, before the first instruction is moved to the final write back stage.

    摘要翻译: 提供了一种用于具有多个寄存器的处理器的方法和相关装置,其中顺序地发出指令以从初始阶段到最终回写阶段移动经过一系列执行阶段。 作为一种方法,实施例包括发出诸如FMA指令的第一指令以移动经过执行级序列的步骤,第一指令被引导到指定的一个寄存器。 该方法还包括发出第二指令以移动通过执行阶段,第二指令在第一指令发出之后但在第一指令到达最终回写阶段之前发出。 第二条指令同样针对指定的寄存器,并且选择性地包括存储指令或加载指令。 使用与指定寄存器相对应的R和W位来确保存储指令不会从第一指令移动到最终回写之前分别读取数据,并且加载指令不会将数据写入指定的寄存器 阶段。