Branch lookahead prefetch for microprocessors
    1.
    发明授权
    Branch lookahead prefetch for microprocessors 有权
    用于微处理器的分支前瞻预取

    公开(公告)号:US07877580B2

    公开(公告)日:2011-01-25

    申请号:US11953799

    申请日:2007-12-10

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3842 G06F9/3861

    摘要: A method of handling program instructions in a microprocessor which reduces delays associated with mispredicted branch instructions, by detecting the occurrence of a stall condition during execution of the program instructions, speculatively executing one or more pending instructions which include at least one branch instruction during the stall condition, and determining the validity of data utilized by the speculative execution. Dispatch logic determines the validity of the data by marking one or more registers of an instruction dispatch unit to indicate which results of the pending instructions are invalid. The speculative execution of instructions can occur across multiple pipeline stages of the microprocessor, and the validity of the data is tracked during their execution in the multiple pipeline stages while monitoring a dependency of the speculatively executed instructions relative to one another during their execution in the multiple pipeline stages.

    摘要翻译: 一种处理微处理器中的程序指令的方法,其通过在执行程序指令期间检测到失速状态的发生来减少与错误预测的分支指令相关联的延迟,推测性地执行一个或多个未决指令,其中包括在失速期间包括至少一个分支指令 条件,并确定投机执行使用的数据的有效性。 调度逻辑通过标记指令调度单元的一个或多个寄存器来指示待处理指令的哪些结果无效来确定数据的有效性。 指令的推测执行可以在微处理器的多个流水线阶段发生,并且在多个流水线阶段的执行期间跟踪数据的有效性,同时在多个流水线阶段的执行期间监视推测性执行的指令相对于彼此的依赖性 流水线阶段

    Using a modified value GPR to enhance lookahead prefetch
    2.
    发明授权
    Using a modified value GPR to enhance lookahead prefetch 失效
    使用修改值GPR来增强前瞻预取

    公开(公告)号:US07421567B2

    公开(公告)日:2008-09-02

    申请号:US11016206

    申请日:2004-12-17

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: The present invention allows a microprocessor to identify and speculatively execute future instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The execution of such future instructions can initiate a prefetch of data or instructions from a distant cache or main memory, or otherwise make forward progress through the instruction stream. In this manner, when the instructions are re-executed (non speculatively executed) after the stall condition expires, they will execute with a reduced execution latency; e.g. by accessing data prefetched into the L1 cache, or enroute to the processor, or by executing the target instructions following a speculatively resolved mispredicted branch. In speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available. Dependency and dirty (i.e. invalid result) bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions will then use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers will be used.

    摘要翻译: 本发明允许微处理器在失速状态期间识别和推测地执行未来的指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 这样的未来指令的执行可以启动来自远程高速缓存或主存储器的数据或指令的预取,或以其他方式通过指令流进行进展。 以这种方式,当在停止条件到期之后重新执行(不推测地执行)指令时,它们将以降低的执行延迟执行; 例如 通过访问预取到L1高速缓存中的数据,或者进入处理器,或通过在推测性地解决的误预测分支之后执行目标指令。 在推测模式中,由于缺少L1缓存的源加载,在推测执行模式下不可用的设备,或由于不可用的推测指令结果,指令操作数可能无效。 跟踪依赖关系和脏(即无效结果)位,并用于确定哪些推测指令对执行有效。 改进的值寄存器存储和位向量被用于提高推测结果的可用性,否则,由于不能将其写入到架构化的寄存器,否则将抛弃执行流水线。 修改后的通用寄存器用于在对应指令到达回写时存储推测结果,修改后的位向量跟踪存储在其中的结果。 当修改的位向量中的相应位指示数据已被修改时,不直接从旧指令旁路的较小的推测指令将使用该修改的数据。 否则,将使用来自架构化寄存器的数据。

    Using a modified value GPR to enhance lookahead prefetch
    3.
    发明授权
    Using a modified value GPR to enhance lookahead prefetch 失效
    使用修改值GPR来增强前瞻预取

    公开(公告)号:US07620799B2

    公开(公告)日:2009-11-17

    申请号:US12061290

    申请日:2008-04-02

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: Mechanisms to identify and speculatively execute future instructions during a stall condition are provided. In speculative mode, instruction operands may be invalid due to a number of reasons. Dependency and dirty bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers is used.

    摘要翻译: 提供了在失速状态下识别和推测执行未来指令的机制。 在推测模式下,指令操作数可能因无数原因而无效。 跟踪依赖关系和脏位,并用于确定哪些推测指令对执行有效。 改进的值寄存器存储和位向量被用于提高推测结果的可用性,否则,由于不能将其写入到架构化的寄存器,否则将抛弃执行流水线。 修改后的通用寄存器用于在对应指令到达回写时存储推测结果,修改后的位向量跟踪存储在其中的结果。 当修改的位向量中的相应位指示数据已被修改时,不直接从旧指令中绕过的较小的推测指令使用该修改的数据。 否则,将使用来自架构寄存器的数据。

    Branch lookahead prefetch for microprocessors
    4.
    发明授权
    Branch lookahead prefetch for microprocessors 失效
    用于微处理器的分支前瞻预取

    公开(公告)号:US07552318B2

    公开(公告)日:2009-06-23

    申请号:US11016200

    申请日:2004-12-17

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3842 G06F9/3861

    摘要: A method of handling program instructions in a microprocessor which reduces delays associated with mispredicted branch instructions, by detecting the occurrence of a stall condition during execution of the program instructions, speculatively executing one or more pending instructions which include at least one branch instruction during the stall condition, and determining the validity of data utilized by the speculative execution. Dispatch logic determines the validity of the data by marking one or more registers of an instruction dispatch unit to indicate which results of the pending instructions are invalid. The speculative execution of instructions can occur across multiple pipeline stages of the microprocessor, and the validity of the data is tracked during their execution in the multiple pipeline stages while monitoring a dependency of the speculatively executed instructions relative to one another during their execution in the multiple pipeline stages.

    摘要翻译: 一种处理微处理器中的程序指令的方法,其通过在执行程序指令期间检测到失速状态的发生来减少与错误预测的分支指令相关联的延迟,推测性地执行一个或多个未决指令,其中包括在失速期间包括至少一个分支指令 条件,并确定投机执行使用的数据的有效性。 调度逻辑通过标记指令调度单元的一个或多个寄存器来指示待处理指令的哪些结果无效来确定数据的有效性。 指令的推测执行可以在微处理器的多个流水线阶段发生,并且在多个流水线阶段的执行期间跟踪数据的有效性,同时在多个流水线阶段的执行期间监视推测性执行的指令相对于彼此的依赖性 流水线阶段

    Load lookahead prefetch for microprocessors
    5.
    发明授权
    Load lookahead prefetch for microprocessors 有权
    加载微处理器的前瞻预取

    公开(公告)号:US07594096B2

    公开(公告)日:2009-09-22

    申请号:US11950495

    申请日:2007-12-05

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: The present invention allows a microprocessor to identify and speculatively execute future load instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The data for such future load instructions can be prefetched from a distant cache or main memory such that when the load instruction is re-executed (non speculative executed) after the stall condition expires, its data will reside either in the L1 cache, or will be enroute to the processor, resulting in a reduced execution latency. When an extended stall condition is detected, load lookahead prefetch is started allowing speculative execution of instructions that would normally have been stalled. In this speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available via forwarding and are not written to the architected registers. A set of status bits are used to dynamically keep track of the dependencies between instructions in the pipeline and a bit vector tracks invalid architected facilities with respect to the speculative instruction stream. Both sources of information are used to identify load instructions with valid operands for calculating the load address. If the operands are valid, then a load prefetch operation is started to retrieve data from the cache ahead of time such that it can be available for the load instruction when it is non-speculatively executed.

    摘要翻译: 本发明允许微处理器在失速状态期间识别并推测性地执行未来的加载指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 可以从远程高速缓存或主存储器预取这样的未来加载指令的数据,使得当停止条件到期之后,当加载指令被重新执行(不推测执行)时,其数据将驻留在L1高速缓存中,或者将 进入处理器,导致执行时间缩短。 当检测到扩展失速条件时,启动加载前瞻预取,允许推测执行通常已经停止的指令。 在这种推测模式中,由于缺少L1高速缓存的源负载,设备在推测执行模式下不可用的设备,或由于不能通过转发而不能使用并且未写入到架构化寄存器的推测性指令结果,指令操作数可能无效。 一组状态位用于动态地跟踪流水线中的指令之间的依赖关系,并且位向量相对于推测性指令流跟踪无效的架构设施。 两个信息来源用于识别加载指令,其中包含用于计算加载地址的有效操作数。 如果操作数有效,则启动加载预取操作以提前从高速缓存中检索数据,使得当非推测性地执行加载指令时,可以对加载指令可用。

    Load lookahead prefetch for microprocessors
    6.
    发明授权
    Load lookahead prefetch for microprocessors 失效
    加载微处理器的前瞻预取

    公开(公告)号:US07444498B2

    公开(公告)日:2008-10-28

    申请号:US11016236

    申请日:2004-12-17

    摘要: The present invention allows a microprocessor to identify and speculatively execute future load instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The data for such future load instructions can be prefetched from a distant cache or main memory such that when the load instruction is re-executed (non speculative executed) after the stall condition expires, its data will reside either in the L1 cache, or will be enroute to the processor, resulting in a reduced execution latency. When an extended stall condition is detected, load lookahead prefetch is started allowing speculative execution of instructions that would normally have been stalled. In this speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available via forwarding and are not written to the architected registers. A set of status bits are used to dynamically keep track of the dependencies between instructions in the pipeline and a bit vector tracks invalid architected facilities with respect to the speculative instruction stream. Both sources of information are used to identify load instructions with valid operands for calculating the load address. If the operands are valid, then a load prefetch operation is started to retrieve data from the cache ahead of time such that it can be available for the load instruction when it is non-speculatively executed.

    摘要翻译: 本发明允许微处理器在失速状态期间识别并推测性地执行未来的加载指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 可以从远程高速缓存或主存储器预取这样的未来加载指令的数据,使得当停止条件到期之后,当加载指令被重新执行(不推测执行)时,其数据将驻留在L1高速缓存中,或者将 进入处理器,导致执行时间缩短。 当检测到扩展失速条件时,启动加载前瞻预取,允许推测执行通常已经停止的指令。 在这种推测模式中,由于缺少L1高速缓存的源负载,设备在推测执行模式下不可用的设备,或由于不能通过转发而不能使用并且未写入到架构化寄存器的推测性指令结果,指令操作数可能无效。 一组状态位用于动态地跟踪流水线中的指令之间的依赖关系,并且位向量相对于推测性指令流跟踪无效的架构设施。 两个信息来源用于识别加载指令,其中包含用于计算加载地址的有效操作数。 如果操作数有效,则启动加载预取操作以提前从高速缓存中检索数据,使得当非推测性地执行加载指令时,可以对加载指令可用。

    Using a Modified Value GPR to Enhance Lookahead Prefetch
    7.
    发明申请
    Using a Modified Value GPR to Enhance Lookahead Prefetch 失效
    使用修改值GPR来增强前瞻预取

    公开(公告)号:US20080250230A1

    公开(公告)日:2008-10-09

    申请号:US12061290

    申请日:2008-04-02

    IPC分类号: G06F9/30

    摘要: The present invention allows a microprocessor to identify and speculatively execute future instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The execution of such future instructions can initiate a prefetch of data or instructions from a distant cache or main memory, or otherwise make forward progress through the instruction stream. In this manner, when the instructions are re-executed (non speculatively executed) after the stall condition expires, they will execute with a reduced execution latency; e.g. by accessing data prefetched into the L1 cache, or enroute to the processor, or by executing the target instructions following a speculatively resolved mispredicted branch. In speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available. Dependency and dirty (i.e. invalid result) bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions will then use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers will be used.

    摘要翻译: 本发明允许微处理器在失速状态期间识别和推测地执行未来的指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 这样的未来指令的执行可以启动来自远程高速缓存或主存储器的数据或指令的预取,或以其他方式通过指令流进行进展。 以这种方式,当在停止条件到期之后重新执行(不推测地执行)指令时,它们将以降低的执行延迟执行; 例如 通过访问预取到L1高速缓存中的数据,或者进入处理器,或通过在推测性地解决的误预测分支之后执行目标指令。 在推测模式中,由于缺少L1缓存的源加载,在推测执行模式下不可用的设备,或由于不可用的推测指令结果,指令操作数可能无效。 跟踪依赖关系和脏(即无效结果)位,并用于确定哪些推测指令对执行有效。 改进的值寄存器存储和位向量被用于提高推测结果的可用性,否则,由于不能将其写入到架构化的寄存器,否则将抛弃执行流水线。 修改后的通用寄存器用于在对应指令到达回写时存储推测结果,修改后的位向量跟踪存储在其中的结果。 当修改的位向量中的相应位指示数据已被修改时,不直接从旧指令旁路的较小的推测指令将使用该修改的数据。 否则,将使用来自架构化寄存器的数据。

    Multi-Mode Register Rename Mechanism for a Highly Threaded Simultaneous Multi-Threaded Microprocessor
    8.
    发明申请
    Multi-Mode Register Rename Mechanism for a Highly Threaded Simultaneous Multi-Threaded Microprocessor 有权
    多线程同时多线程微处理器的多模式寄存器重命名机制

    公开(公告)号:US20080250226A1

    公开(公告)日:2008-10-09

    申请号:US11696363

    申请日:2007-04-04

    IPC分类号: G06F15/00

    摘要: A multi-mode register rename mechanism which allows a simultaneous multi-threaded processor to support full out-of-order thread execution when the number of threads is low and in-order thread execution when the number of threads increases. Responsive to changing an execution mode of a processor to operate in in-order thread execution mode, the illustrative embodiments switch a physical register in the data processing system to an architected facility, thereby forming a switched physical register. When an instruction is issued to an execution unit, wherein the issued instruction comprises a thread bit, the thread bit is examined to determine if the instruction accesses an architected facility. If the issued instruction accesses an architected facility, the instruction is executed, and the results of the executed instruction are written to the switched physical register.

    摘要翻译: 多模式寄存器重命名机制,允许同时多线程处理器在线程数量低时支持完全无序的线程执行,并且当线程数增加时按顺序执行线程。 响应于改变处理器的执行模式以按顺序执行线程执行模式,所述说明性实施例将数据处理系统中的物理寄存器切换到架构设施,从而形成切换的物理寄存器。 当向执行单元发出指令时,其中发出的指令包括一个线程位,检查该线程位以确定该指令是否访问一个架构设施。 如果发出的指令访问架构设施,则执行该指令,并且将所执行的指令的结果写入切换的物理寄存器。

    Multi-mode register rename mechanism that augments logical registers by switching a physical register from the register rename buffer when switching between in-order and out-of-order instruction processing in a simultaneous multi-threaded microprocessor
    9.
    发明授权
    Multi-mode register rename mechanism that augments logical registers by switching a physical register from the register rename buffer when switching between in-order and out-of-order instruction processing in a simultaneous multi-threaded microprocessor 有权
    多模式寄存器重命名机制,通过在同时多线程微处理器中的顺序和无序指令处理之间切换时,通过从寄存器重命名缓冲器切换物理寄存器来增加逻辑寄存器

    公开(公告)号:US08347068B2

    公开(公告)日:2013-01-01

    申请号:US11696363

    申请日:2007-04-04

    IPC分类号: G06F9/30

    摘要: A multi-mode register rename mechanism which allows a simultaneous multi-threaded processor to support full out-of-order thread execution when the number of threads is low and in-order thread execution when the number of threads increases. Responsive to changing an execution mode of a processor to operate in in-order thread execution mode, the illustrative embodiments switch a physical register in the data processing system to an architected facility, thereby forming a switched physical register. When an instruction is issued to an execution unit, wherein the issued instruction comprises a thread bit, the thread bit is examined to determine if the instruction accesses an architected facility. If the issued instruction accesses an architected facility, the instruction is executed, and the results of the executed instruction are written to the switched physical register.

    摘要翻译: 多模式寄存器重命名机制,允许同时多线程处理器在线程数量低时支持完全无序的线程执行,并且当线程数增加时按顺序执行线程。 响应于改变处理器的执行模式以按顺序执行线程执行模式,所述说明性实施例将数据处理系统中的物理寄存器切换到架构设施,从而形成切换的物理寄存器。 当向执行单元发出指令时,其中发出的指令包括一个线程位,检查该线程位以确定该指令是否访问一个架构设施。 如果发出的指令访问架构设施,则执行该指令,并且将所执行的指令的结果写入切换的物理寄存器。

    Selective flush of shared and other pipeline stages in a multithread processor
    10.
    发明授权
    Selective flush of shared and other pipeline stages in a multithread processor 有权
    多线程处理器中共享和其他流水线阶段的选择性刷新

    公开(公告)号:US06694425B1

    公开(公告)日:2004-02-17

    申请号:US09564930

    申请日:2000-05-04

    IPC分类号: G06F940

    CPC分类号: G06F9/3851 G06F9/3867

    摘要: In a simultaneous multithread processor, a flush mechanism of a shared pipeline stage is disclosed. In the preferred embodiment, the shared pipeline stage happens to be one or all of the fetch stage, the decode stage, and/or the dispatch stage and the flush mechanism flushes instructions at the dispatch stage and earlier stages. The dispatch flush mechanism detects when an instruction of a particular thread is stalled at the dispatch stage of the pipelined processor. Subsequent instructions of that thread are flushed from all pipeline stages of the processor up to and including the dispatch stage. The dispatch stage is distinguished as being the stage in which all resources necessary for the successful dispatch of the instruction to the issue queues are checked. If a resource required only by that instruction is unavailable, then a dispatch flush is performed. Flush prioritization logic is available to determine if other flush conditions, including a previous dispatch flush, exist for that particular thread. If so, the flush prioritization logic will determine which flush, if any, should proceed. Those resources necessary for the successful dispatch and issuance of the instruction to the execution units but which are unavailable may be private or separate registers for each thread, may be special purpose or other non-renamed registers, or may be instructions for synchronized access to memory. This dispatch flush mechanism is more efficient than existing flush mechanisms which must flush throughout the processor pipelines up to and including the issue queues and execution units and result registers.

    摘要翻译: 在同时多线程处理器中,公开了共享流水线级的冲洗机构。 在优选实施例中,共享流水线阶段恰好是提取阶段,解码阶段和/或调度阶段中的一个或全部,并且刷新机制在调度阶段和早期阶段刷新指令。 调度刷新机制检测在流水线处理器的调度阶段什么时候特定线程的指令停止。 该线程的后续指令从处理器的所有流水线阶段冲洗到并包括调度阶段。 调度阶段被区分为检查成功发送指令到发布队列所需的所有资源的阶段。 如果该指令所需的资源不可用,则执行调度刷新。 刷新优先级逻辑可用于确定是否存在针对该特定线程的其他刷新条件(包括先前的调度刷新)。 如果是这样,刷新优先级逻辑将确定哪个刷新(如果有的话)应该进行。 成功发送执行单元但不可用的指令所需的资源可以是每个线程的专用或单独的寄存器,可以是特殊目的或其他未重命名的寄存器,或者可以是同步访问存储器的指令 。 这种调度刷新机制比现有的刷新机制更有效,这些机制必须遍及整个处理器管道,直到并包括问题队列和执行单元和结果寄存器。