Using a modified value GPR to enhance lookahead prefetch
    52.
    发明授权
    Using a modified value GPR to enhance lookahead prefetch 失效
    使用修改值GPR来增强前瞻预取

    公开(公告)号:US07421567B2

    公开(公告)日:2008-09-02

    申请号:US11016206

    申请日:2004-12-17

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: The present invention allows a microprocessor to identify and speculatively execute future instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The execution of such future instructions can initiate a prefetch of data or instructions from a distant cache or main memory, or otherwise make forward progress through the instruction stream. In this manner, when the instructions are re-executed (non speculatively executed) after the stall condition expires, they will execute with a reduced execution latency; e.g. by accessing data prefetched into the L1 cache, or enroute to the processor, or by executing the target instructions following a speculatively resolved mispredicted branch. In speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available. Dependency and dirty (i.e. invalid result) bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions will then use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers will be used.

    摘要翻译: 本发明允许微处理器在失速状态期间识别和推测地执行未来的指令。 这允许在停顿条件期间通过指令流进行正向进展,否则将导致微处理器或执行线程空闲。 这样的未来指令的执行可以启动来自远程高速缓存或主存储器的数据或指令的预取,或以其他方式通过指令流进行进展。 以这种方式,当在停止条件到期之后重新执行(不推测地执行)指令时,它们将以降低的执行延迟执行; 例如 通过访问预取到L1高速缓存中的数据,或者进入处理器,或通过在推测性地解决的误预测分支之后执行目标指令。 在推测模式中,由于缺少L1缓存的源加载,在推测执行模式下不可用的设备,或由于不可用的推测指令结果,指令操作数可能无效。 跟踪依赖关系和脏(即无效结果)位,并用于确定哪些推测指令对执行有效。 改进的值寄存器存储和位向量被用于提高推测结果的可用性,否则,由于不能将其写入到架构化的寄存器,否则将抛弃执行流水线。 修改后的通用寄存器用于在对应指令到达回写时存储推测结果,修改后的位向量跟踪存储在其中的结果。 当修改的位向量中的相应位指示数据已被修改时,不直接从旧指令旁路的较小的推测指令将使用该修改的数据。 否则,将使用来自架构化寄存器的数据。

    Apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue
    53.
    发明授权
    Apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue 有权
    用于快速确定非移动指令队列中最旧指令的装置,系统和方法

    公开(公告)号:US07302553B2

    公开(公告)日:2007-11-27

    申请号:US10351556

    申请日:2003-01-23

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: An apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue of a processor are provided. Particularly, instructions are stored, one at a time at a clock cycle, in the non-moving queue. At every clock cycle, a present status of the instructions in the queue is recorded. Using the present status of the instructions in the queue in conjunction with previously recorded statuses of the instructions, the oldest instruction in the queue is determined. The status of the instructions in the queue includes whether or not the instruction has been issued for execution as well as whether or not it is known that the issued instruction has been accepted for execution.

    摘要翻译: 提供了一种用于快速确定处理器的非移动指令队列中的最旧指令的装置,系统和方法。 特别地,在不移动队列中,以时钟周期一次存储指令。 在每个时钟周期,记录队列中的指令的当前状态。 结合队列中的指令的当前状态结合先前记录的指令状态,确定队列中最早的指令。 队列中的指令的状态包括是否已经发出指令执行,以及是否知道发出的指令已被接受执行。

    Instruction group formation and mechanism for SMT dispatch
    54.
    发明授权
    Instruction group formation and mechanism for SMT dispatch 失效
    SMT派遣指导小组组织和机制

    公开(公告)号:US07237094B2

    公开(公告)日:2007-06-26

    申请号:US10965143

    申请日:2004-10-14

    IPC分类号: G06F9/38

    摘要: A more efficient method of handling instructions in a computer processor, by associating resource fields with respective program instructions wherein the resource fields indicate which of the processor hardware resources are required to carry out the program instructions, calculating resource requirements for merging two or more program instructions based on their resource fields, and determining resource availability for simultaneously executing the merged program instructions based on the calculated resource requirements. Resource vectors indicative of the required resource may be encoded into the resource fields, and the resource fields decoded at a later stage to derive the resource vectors. The resource fields can be stored in the instruction cache associated with the respective program instructions. The processor may operate in a simultaneous multithreading mode with different program instructions being part of different hardware threads. When the resource availability equals or exceeds the resource requirements for a group of instructions, those instructions can be dispatched simultaneously to the hardware resources. A start bit may be inserted in one of the program instructions to define the instruction group. The hardware resources may in particular be execution units such as a fixed-point unit, a load/store unit, a floating-point unit, or a branch processing unit.

    摘要翻译: 通过将资源字段与相应的程序指令相关联来处理计算机处理器中的指令的更有效的方法,其中资源字段指示需要哪个处理器硬件资源来执行程序指令,计算用于合并两个或多个程序指令的资源需求 并且基于所计算的资源需求来确定用于同时执行所合并的程序指令的资源可用性。 指示所需资源的资源矢量可以被编码到资源字段中,并且在稍后阶段解码资源字段以导出资源向量。 资源字段可以存储在与相应的程序指令相关联的指令高速缓存中。 处理器可以以同时多线程模式操作,其中不同的程序指令是不同硬件线程的一部分。 当资源可用性等于或超过一组指令的资源需求时,可以将这些指令同时发送到硬件资源。 可以在程序指令之一中插入起始位以定义指令组。 硬件资源可以特别地是诸如定点单元,加载/存储单元,浮点单元或分支处理单元之类的执行单元。

    Instruction completion logic distributed among execution units for
improving completion efficiency
    55.
    发明授权
    Instruction completion logic distributed among execution units for improving completion efficiency 失效
    指令完成逻辑分布在执行单元之间,以提高完成效率

    公开(公告)号:US6134645A

    公开(公告)日:2000-10-17

    申请号:US87886

    申请日:1998-06-01

    申请人: Dung Quoc Nguyen

    发明人: Dung Quoc Nguyen

    IPC分类号: G06F9/38 G06F9/00

    摘要: Each execution unit within a superscalar processor has an associated completion table that contains a copy of the status of all instructions dispatched but not completed. A central completion table maintains the status of every dispatched instruction as reported by the dispatch unit and the individual execution units. Execution units send finish signals to the completion table responsible for retiring a particular type of instruction. The central completion table retires instructions that may cause an interrupt and instructions whose results may target the same register. The execution units' associated completion tables retire the balance of the instructions and the execution units send instruction status to the central completion table and to each execution unit. This reduces the number of instructions that are retired by the central completion table, increasing the number of instructions retired per clock cycle.

    摘要翻译: 超标量处理器中的每个执行单元都有一个关联的完成表,其中包含已分派但未完成的所有指令的状态的副本。 中央完成表维护由调度单元和各个执行单元报告的每个调度指令的状态。 执行单元将完成信号发送到负责退出特定类型指令的完成表。 中央完成表退出可能导致中断的指令,其结果可能指向同一个寄存器。 执行单元相关联的完成表退出指令的平衡,并且执行单元向中央完成表和每个执行单元发送指令状态。 这减少了由中央完成表退出的指令数量,从而增加每个时钟周期退出的指令数量。

    Processor register recovery after flush operation
    56.
    发明授权
    Processor register recovery after flush operation 有权
    冲洗操作后的处理器寄存器恢复

    公开(公告)号:US08245018B2

    公开(公告)日:2012-08-14

    申请号:US12347924

    申请日:2008-12-31

    申请人: Dung Quoc Nguyen

    发明人: Dung Quoc Nguyen

    IPC分类号: G06F9/30

    摘要: An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation.

    摘要翻译: 信息处理系统包括处理器,其可以在诸如分支错误预测引起的异常的指令刷新操作之后执行通用寄存器恢复操作。 处理器接收可以包括在存储指令结果信息的特定目标寄存器上操作的多个指令的指令流。 通用寄存器可以临时存储在调度,执行和其他操作期间使用的指令操作码和寄存器位信息。 处理器包括用于在冲洗恢复操作期间使用的恢复缓冲单元。 处理器可以在刷新操作的寄存器恢复期间使用与每个指令对应的恢复有效和恢复挂起位。

    PROCESSOR REGISTER RECOVERY AFTER FLUSH OPERATION
    57.
    发明申请
    PROCESSOR REGISTER RECOVERY AFTER FLUSH OPERATION 有权
    冲洗操作后的处理器注册恢复

    公开(公告)号:US20100169622A1

    公开(公告)日:2010-07-01

    申请号:US12347924

    申请日:2008-12-31

    申请人: Dung Quoc Nguyen

    发明人: Dung Quoc Nguyen

    IPC分类号: G06F9/30

    摘要: An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation.

    摘要翻译: 信息处理系统包括处理器,其可以在诸如分支错误预测引起的异常的指令刷新操作之后执行通用寄存器恢复操作。 处理器接收可以包括在存储指令结果信息的特定目标寄存器上操作的多个指令的指令流。 通用寄存器可以临时存储在调度,执行和其他操作期间使用的指令操作码和寄存器位信息。 处理器包括用于在冲洗恢复操作期间使用的恢复缓冲单元。 处理器可以在刷新操作的寄存器恢复期间使用与每个指令对应的恢复有效和恢复挂起位。

    Using a modified value GPR to enhance lookahead prefetch
    58.
    发明授权
    Using a modified value GPR to enhance lookahead prefetch 失效
    使用修改值GPR来增强前瞻预取

    公开(公告)号:US07620799B2

    公开(公告)日:2009-11-17

    申请号:US12061290

    申请日:2008-04-02

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: Mechanisms to identify and speculatively execute future instructions during a stall condition are provided. In speculative mode, instruction operands may be invalid due to a number of reasons. Dependency and dirty bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers is used.

    摘要翻译: 提供了在失速状态下识别和推测执行未来指令的机制。 在推测模式下,指令操作数可能因无数原因而无效。 跟踪依赖关系和脏位,并用于确定哪些推测指令对执行有效。 改进的值寄存器存储和位向量被用于提高推测结果的可用性,否则,由于不能将其写入到架构化的寄存器,否则将抛弃执行流水线。 修改后的通用寄存器用于在对应指令到达回写时存储推测结果,修改后的位向量跟踪存储在其中的结果。 当修改的位向量中的相应位指示数据已被修改时,不直接从旧指令中绕过的较小的推测指令使用该修改的数据。 否则,将使用来自架构寄存器的数据。

    Branch lookahead prefetch for microprocessors
    59.
    发明授权
    Branch lookahead prefetch for microprocessors 失效
    用于微处理器的分支前瞻预取

    公开(公告)号:US07552318B2

    公开(公告)日:2009-06-23

    申请号:US11016200

    申请日:2004-12-17

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3842 G06F9/3861

    摘要: A method of handling program instructions in a microprocessor which reduces delays associated with mispredicted branch instructions, by detecting the occurrence of a stall condition during execution of the program instructions, speculatively executing one or more pending instructions which include at least one branch instruction during the stall condition, and determining the validity of data utilized by the speculative execution. Dispatch logic determines the validity of the data by marking one or more registers of an instruction dispatch unit to indicate which results of the pending instructions are invalid. The speculative execution of instructions can occur across multiple pipeline stages of the microprocessor, and the validity of the data is tracked during their execution in the multiple pipeline stages while monitoring a dependency of the speculatively executed instructions relative to one another during their execution in the multiple pipeline stages.

    摘要翻译: 一种处理微处理器中的程序指令的方法,其通过在执行程序指令期间检测到失速状态的发生来减少与错误预测的分支指令相关联的延迟,推测性地执行一个或多个未决指令,其中包括在失速期间包括至少一个分支指令 条件,并确定投机执行使用的数据的有效性。 调度逻辑通过标记指令调度单元的一个或多个寄存器来指示待处理指令的哪些结果无效来确定数据的有效性。 指令的推测执行可以在微处理器的多个流水线阶段发生,并且在多个流水线阶段的执行期间跟踪数据的有效性,同时在多个流水线阶段的执行期间监视推测性执行的指令相对于彼此的依赖性 流水线阶段

    Apparatus and Method for Providing Multiple Reads/Writes Using a 2Read/2Write Register File Array
    60.
    发明申请
    Apparatus and Method for Providing Multiple Reads/Writes Using a 2Read/2Write Register File Array 有权
    使用2Read / 2Write寄存器文件阵列提供多个读/写的装置和方法

    公开(公告)号:US20080239860A1

    公开(公告)日:2008-10-02

    申请号:US12134537

    申请日:2008-06-06

    IPC分类号: G11C8/00

    CPC分类号: G06F9/30141

    摘要: An apparatus and method are provided for reading a plurality of consecutive entries and writing a plurality of consecutive entries with only one read address and one write address using a 2Read/2Write register file. In one exemplary embodiment, a 64 entry register file array is partitioned into four sub-arrays. Each sub-array contains sixteen entries having one or more 2Read/2Write SRAM cells. The apparatus and method provide a mechanism to write the consecutive entries by only having a 4 to 16 decode of one address. In addition, the apparatus and method provide a mechanism for reading data from the register file array using a starting read word address and two read word lines generated based on the starting read word address. The two read word lines are used to access the two read ports of the entries in the sub-arrays.

    摘要翻译: 提供一种用于读取多个连续条目并使用2Read / 2Write寄存器文件仅写入一个读取地址和一个写入地址的多个连续条目的装置和方法。 在一个示例性实施例中,64个入口寄存器文件阵列被划分为四个子阵列。 每个子阵列包含16个具有一个或多个2Read / 2Write SRAM单元的条目。 该装置和方法提供了通过仅对一个地址进行4到16个解码来写入连续条目的机制。 此外,该装置和方法提供了一种用于使用起始读字地址和基于起始读字地址生成的两个读字线从寄存器堆数组读取数据的机制。 两条读字线用于访问子阵列中条目的两个读端口。