BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS
    1.
    发明申请
    BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS 有权
    用于微处理器的分支机构故障恢复机制

    公开(公告)号:US20100169611A1

    公开(公告)日:2010-07-01

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/312

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    Branch misprediction recovery mechanism for microprocessors
    2.
    发明授权
    Branch misprediction recovery mechanism for microprocessors 有权
    微处理器分支错误预测恢复机制

    公开(公告)号:US08099586B2

    公开(公告)日:2012-01-17

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    System and method to manage address translation requests
    3.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER
    4.
    发明申请
    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER 审中-公开
    使用一般负载/存储超时计数器在螺纹加工器上减少螺纹头

    公开(公告)号:US20130297910A1

    公开(公告)日:2013-11-07

    申请号:US13463319

    申请日:2012-05-03

    IPC分类号: G06F9/30 G06F9/38

    摘要: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced.

    摘要翻译: 在具有动态资源分配的线程处理器中有效的线程仲裁的系统和方法。 处理器包括由多个线程共享的资源。 资源包括可以分配给任何线程使用的条目。 控制逻辑检测长延迟指令。 长延迟指令的延迟大于给定的阈值。 一个示例是对于丢失最后一级数据高速缓存的存储指令具有对后读写(RAW)数据依赖性的加载指令。 选择长延迟指令或立即更年轻的指令用于相关线程的重放。 相关线程的流水线冲洗和重播将以所选指令开始。 比长延迟指令更年轻的指令保持在给定的流水线阶段,直到长延迟指令完成。 在重放期间,这种保持可以防止资源被分配给相关联的线程,而长时间延迟指令被服务。

    System and Method to Manage Address Translation Requests
    5.
    发明申请
    System and Method to Manage Address Translation Requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US20100332787A1

    公开(公告)日:2010-12-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/10 G06F12/00

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    Processor operating mode for mitigating dependency conditions between instructions having different operand sizes
    6.
    发明授权
    Processor operating mode for mitigating dependency conditions between instructions having different operand sizes 有权
    用于缓解具有不同操作数大小的指令之间的依赖条件的处理器操作模式

    公开(公告)号:US08504805B2

    公开(公告)日:2013-08-06

    申请号:US12428464

    申请日:2009-04-22

    IPC分类号: G06F7/483

    摘要: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes). This operating mode may be employed on a per-thread basis.

    摘要翻译: 公开了用于减轻指令组之间依赖性的各种技术。 在一个实施例中,这种依赖性包括“恶双”条件,其中第一浮点指令具有作为目的地的逻辑浮点寄存器的第一部分(例如,单精度写入),并且其中第二浮点指令 后续浮点指令作为源的相同逻辑浮点寄存器的第一部分和第二部分(例如,双精度读取)。 所公开的技术可以适用于实现寄存器重命名的多线程处理器。 在一个实施例中,处理器可以进入操作模式,在该操作模式中,恶意孪生“生产者”(例如,单精度写入)的检测导致指令序列被修改以破坏潜在依赖性。 指令序列的修改可以继续,直到达到一个或多个退出标准(例如,提交预定数量的单精度写入)。 该操作模式可以在每个线程的基础上使用。

    PROCESSOR OPERATING MODE FOR MITIGATING DEPENDENCY CONDITIONS
    7.
    发明申请
    PROCESSOR OPERATING MODE FOR MITIGATING DEPENDENCY CONDITIONS 有权
    处理器操作模式以减轻依赖性条件

    公开(公告)号:US20100274994A1

    公开(公告)日:2010-10-28

    申请号:US12428464

    申请日:2009-04-22

    IPC分类号: G06F9/30

    摘要: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes). This operating mode may be employed on a per-thread basis.

    摘要翻译: 公开了用于减轻指令组之间依赖性的各种技术。 在一个实施例中,这种依赖性包括“恶双”条件,其中第一浮点指令具有作为目的地的逻辑浮点寄存器的第一部分(例如,单精度写入),并且其中第二浮点指令 后续浮点指令作为源的相同逻辑浮点寄存器的第一部分和第二部分(例如,双精度读取)。 所公开的技术可以适用于实现寄存器重命名的多线程处理器。 在一个实施例中,处理器可以进入操作模式,在该操作模式中,恶意孪生“生产者”(例如,单精度写入)的检测导致指令序列被修改以破坏潜在依赖性。 指令序列的修改可以继续,直到达到一个或多个退出标准(例如,提交预定数量的单精度写入)。 该操作模式可以在每个线程的基础上使用。

    DYNAMIC TAG ALLOCATION IN A MULTITHREADED OUT-OF-ORDER PROCESSOR
    8.
    发明申请
    DYNAMIC TAG ALLOCATION IN A MULTITHREADED OUT-OF-ORDER PROCESSOR 有权
    动态标签分配在一个多边进阶的处理器

    公开(公告)号:US20100333098A1

    公开(公告)日:2010-12-30

    申请号:US12494532

    申请日:2009-06-30

    IPC分类号: G06F9/46 G06F12/08

    摘要: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.

    摘要翻译: 公开了用于动态分配指令标签和使用这些标签的各种技术。 这些技术可能适用于支持无序执行的处理器和支持多线程的体系结构。 可以从可用标签值池中分配一组指令。 标签值可用于确定相对于线程中的其他指令的一组指令的程序顺序。 在指示组(或将要))提交之后,可以释放标签值,以便可以在第二组指令上重新使用。 标记值在线程之间动态分配; 因此,特定标签值或标签值的范围不专用于特定线程。

    TLB tag parity checking without CAM read
    9.
    发明授权
    TLB tag parity checking without CAM read 有权
    没有CAM读取的TLB标签奇偶校验

    公开(公告)号:US07366829B1

    公开(公告)日:2008-04-29

    申请号:US10882806

    申请日:2004-06-30

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    CPC分类号: G06F12/1027 G06F11/1064

    摘要: An apparatus and method for expediting parity checked TLB access operations is described in connection with a multithreaded multiprocessor chip. This parity checking mechanism eliminates the need to read a CAM entry from a TLB during a TLB access by storing the tag parity value in a RAM portion of a TLB, using the CAM key input to generate a tag parity check value for a matched entry, and comparing the generated tag parity check value to the stored tag parity value to determine if there is a parity match or error.

    摘要翻译: 结合多线程多处理器芯片描述了用于加速奇偶校验TLB访问操作的装置和方法。 该奇偶校验机制消除了在TLB接入期间通过使用CAM密钥输入将标签奇偶校验值存储在TLB的RAM部分中来从TLB读取CAM条目的需要,以生成匹配条目的标签奇偶校验值, 以及将生成的标签奇偶校验值与所存储的标签奇偶校验值进行比较,以确定是否存在奇偶匹配或错误。

    Arbitration of window swap operations
    10.
    发明授权
    Arbitration of window swap operations 有权
    窗口交换操作的仲裁

    公开(公告)号:US07426630B1

    公开(公告)日:2008-09-16

    申请号:US10881151

    申请日:2004-06-30

    摘要: In one embodiment, a processor comprises a register file, register management logic coupled to the register file, and at least two sources of window swap operations coupled to the register management logic. The register management logic is configured to control an interface to the register file to switch register windows in the register file in response to one or more window swap operations. The sources of window swap operations and the register management logic are configured to cooperate according to an arbitration scheme to arbitrate between conflicting window swap operations to be performed using the interface. In one particular implementation, for example, block signals may be used from higher priority sources to lower priority sources to block issuance of window swap operations by the lower priority sources.

    摘要翻译: 在一个实施例中,处理器包括寄存器文件,耦合到寄存器文件的寄存器管理逻辑以及耦合到寄存器管理逻辑的至少两个窗口交换源。 寄存器管理逻辑被配置为响应于一个或多个窗口交换操作来控制寄存器文件的接口来切换寄存器文件中的寄存器窗口。 窗口交换操作的来源和寄存器管理逻辑被配置为根据仲裁方案进行协作以在使用该接口执行的冲突的窗口交换操作之间进行仲裁。 在一个特定实现中,例如,可以使用块信号从较高优先级源降低优先级源,以阻止较低优先级源发出窗口交换操作。