Unified high-frequency out-of-order pick queue with support for triggering early issue of speculative instructions
    2.
    发明授权
    Unified high-frequency out-of-order pick queue with support for triggering early issue of speculative instructions 有权
    统一的高频无序拣选队列,支持触发早期发布的投机指令

    公开(公告)号:US09058180B2

    公开(公告)日:2015-06-16

    申请号:US12493743

    申请日:2009-06-29

    摘要: Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields.

    摘要翻译: 用于在处理器中有效挑选无序问题和执行指令的系统和方法。 在一个实施例中,处理器包括动态分配的统一选择队列。 每个条目被配置为存储相对于其他解码指令的年龄和依赖性信息。 此外,每个条目存储拾取的字段,当被断言指示解码的指令已被选择用于无序发行和执行时。 当被确认时,触发字段指示相应的解码指令的结果将在预定数量的时钟周期之后可用。 在较老指令的结果可用之前,可以选择取决于旧指令结果的年轻指令。 在这种情况下,较旧的指令已经断言了选择和触发字段。

    Apparatus and method for local operand bypassing for cryptographic instructions
    3.
    发明授权
    Apparatus and method for local operand bypassing for cryptographic instructions 有权
    用于加密指令的本地操作数旁路的装置和方法

    公开(公告)号:US08356185B2

    公开(公告)日:2013-01-15

    申请号:US12575832

    申请日:2009-10-08

    IPC分类号: G06F9/312 G06F21/00

    摘要: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.

    摘要翻译: 处理器可以包括被配置为发出用于执行的指令的硬件指令获取单元和被配置为接收用于执行的指令的硬件功能单元,其中所述指令包括加密指令和非加密指令。 功能单元可以包括被配置为执行具有相应的加密执行等待时间的加密指令的密码执行流水线,以及配置成执行非加密指令的非加密执行流水线,该非加密执行流水线的长度大于 加密执行延迟。 功能单元还可以包括局部旁路网络,其被配置为将由密码执行流水线产生的结果旁路到在密码执行流水线内执行的依赖密码指令,使得依赖密码指令序列内的每个指令都可以用密码执行等待时间执行, 并且其中加密执行流水线的结果不被旁路到处理器内的任何其他功能单元。

    Dynamic mitigation of thread hogs on a threaded processor
    4.
    发明授权
    Dynamic mitigation of thread hogs on a threaded processor 有权
    在线程处理器上线性猪的动态减轻

    公开(公告)号:US08347309B2

    公开(公告)日:2013-01-01

    申请号:US12511620

    申请日:2009-07-29

    IPC分类号: G06F9/46 G06F9/30

    摘要: Systems and methods for efficient thread arbitration in a processor. A processor comprises a multi-threaded resource. The resource may include an array of entries which may be allocated by threads. A thread arbitration table corresponding to a given thread stores a high and a low threshold value in each table entry. A thread history shift register (HSR) indexes the table, wherein each bit of the HSR indicates whether the given thread is a thread hog. When the given thread has more allocated entries in the array than the high threshold of the table entry, the given thread is stalled from further allocating array entries. Similarly, when the given thread has fewer allocated entries in the array than the low threshold of the selected table entry, the given thread is permitted to allocate entries. In this manner, threads that hog dynamic resources can be mitigated such that more resources are available to other threads that are not thread hogs. This can result in a significant increase in overall processor performance.

    摘要翻译: 处理器中有效线程仲裁的系统和方法。 处理器包括多线程资源。 资源可以包括可由线程分配的条目数组。 对应于给定线程的线程仲裁表在每个表条目中存储高和低阈值。 线程历史移位寄存器(HSR)对表进行索引,其中HSR的每个位指示给定线程是否是线程号。 当给定的线程在数组中具有比表条目的高阈值更多的分配条目时,给定线程从进一步分配数组条目停止。 类似地,当给定的线程在数组中的分配的条目少于所选表条目的低阈值时,允许给定的线程分配条目。 以这种方式,可以减轻动态资源的线程,使得更多的资源可用于不是线程的其他线程。 这可能导致整体处理器性能的显着增加。

    System and method to manage address translation requests
    5.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    MULTI-THREADED INSTRUCTION BUFFER DESIGN
    6.
    发明申请
    MULTI-THREADED INSTRUCTION BUFFER DESIGN 审中-公开
    多线程指令缓冲区设计

    公开(公告)号:US20120233441A1

    公开(公告)日:2012-09-13

    申请号:US13041881

    申请日:2011-03-07

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3851 G06F9/3814

    摘要: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.

    摘要翻译: 公开了一种用于执行多个线程的处理器的指令缓冲器。 指令缓冲器被配置为从获取单元接收指令并向选择单元提供指令。 指令缓冲器包括一个或多个存储器阵列,其包括被配置为存储指令和/或其他信息(例如,程序计数器地址)的多个条目。 一个或多个指示器由处理器维护并对应于多个线程。 一个或多个指示符是可用的,使得对于由指令缓冲器接收的指令,可以将存储器阵列的多个条目中的一个或多个确定为所接收指令的写目的地,并且从指令缓冲器读取指令 (并发送到选择单元),可以将一个或多个条目确定为要从其读取的正确的源位置。

    Hybrid instruction buffer
    7.
    发明授权
    Hybrid instruction buffer 有权
    混合指令缓冲区

    公开(公告)号:US08225034B1

    公开(公告)日:2012-07-17

    申请号:US10881215

    申请日:2004-06-30

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: In one embodiment, a storage buffer includes a plurality of storage locations configured to store a plurality of incoming instructions. The storage buffer also includes a shift FIFO that is coupled to the plurality of storage locations. The shift FIFO includes an entry configured to store an instruction that is next in a program order. In response to receiving a shift signal, control functionality that is coupled to the plurality of storage locations and to the shift FIFO may cause the instruction that is next in the program order to be moved from a given location of the plurality of storage locations to the entry of the shift FIFO.

    摘要翻译: 在一个实施例中,存储缓冲器包括被配置为存储多个传入指令的多个存储位置。 存储缓冲器还包括耦合到多个存储位置的移位FIFO。 移位FIFO包括被配置为存储下一个程序顺序的指令的条目。 响应于接收到移位信号,耦合到多个存储位置和移位FIFO的控制功能可以使得程序顺序中的下一个指令从多个存储位置的给定位置移动到 移位FIFO的输入。

    MECHANISM FOR SELECTING INSTRUCTIONS FOR EXECUTION IN A MULTITHREADED PROCESSOR
    8.
    发明申请
    MECHANISM FOR SELECTING INSTRUCTIONS FOR EXECUTION IN A MULTITHREADED PROCESSOR 有权
    用于选择在多处理器中执行的指令的机制

    公开(公告)号:US20110138153A1

    公开(公告)日:2011-06-09

    申请号:US13027056

    申请日:2011-02-14

    申请人: Robert T. Golla

    发明人: Robert T. Golla

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3851 G06F9/3861

    摘要: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.

    摘要翻译: 在一个实施例中,多线程处理器包括多个缓冲器,每个缓冲器被配置为存储对应于相应线程的指令。 多线程处理器还包括耦合到多个缓冲器的拾取单元。 拾取单元可以在给定周期中从至少一个缓冲器中选择基于线程选择算法的有效指令。 拾取单元可以在给定的周期中进一步取消响应于接收到取消指示而选择有效指令。

    Mechanism for selecting instructions for execution in a multithreaded processor
    9.
    发明授权
    Mechanism for selecting instructions for execution in a multithreaded processor 有权
    在多线程处理器中选择执行指令的机制

    公开(公告)号:US07890734B2

    公开(公告)日:2011-02-15

    申请号:US10881247

    申请日:2004-06-30

    申请人: Robert T. Golla

    发明人: Robert T. Golla

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3851 G06F9/3861

    摘要: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.

    摘要翻译: 在一个实施例中,多线程处理器包括多个缓冲器,每个缓冲器被配置为存储对应于相应线程的指令。 多线程处理器还包括耦合到多个缓冲器的拾取单元。 拾取单元可以在给定周期中从至少一个缓冲器中选择基于线程选择算法的有效指令。 拾取单元可以在给定的周期中进一步取消响应于接收到取消指示而选择有效指令。

    DEPENDENCY MATRIX FOR THE DETERMINATION OF LOAD DEPENDENCIES
    10.
    发明申请
    DEPENDENCY MATRIX FOR THE DETERMINATION OF LOAD DEPENDENCIES 有权
    用于确定负载依赖性的依赖矩阵

    公开(公告)号:US20100332806A1

    公开(公告)日:2010-12-30

    申请号:US12495025

    申请日:2009-06-30

    IPC分类号: G06F9/30

    摘要: Systems and methods for identification of dependent instructions on speculative load operations in a processor. A processor allocates entries of a unified pick queue for decoded and renamed instructions. Each entry of a corresponding dependency matrix is configured to store a dependency bit for each other instruction in the pick queue. The processor speculates that loads will hit in the data cache, hit in the TLB and not have a read after write (RAW) hazard. For each unresolved load, the pick queue tracks dependent instructions via dependency vectors based upon the dependency matrix. If a load speculation is found to be incorrect, dependent instructions in the pick queue are reset to allow for subsequent picking, and dependent instructions in flight are canceled. On completion of a load miss, dependent operations are re-issued. On resolution of a TLB miss or RAW hazard, the original load is replayed and dependent operations are issued again from the pick queue.

    摘要翻译: 用于识别处理器中推测加载操作的依赖指令的系统和方法。 处理器为解码和重新命名的指令分配统一挑选队列的条目。 相应的依赖矩阵的每个条目被配置为在拾取队列中存储每个其他指令的依赖位。 处理器推测负载将在数据高速缓存中击中,在TLB中触发,写入(RAW)危险后不会有读取。 对于每个未解决的负载,拾取队列基于依赖矩阵通过依赖向量跟踪相关指令。 如果发现负载推测不正确,则选择队列中的相关指令将被重置,以允许随后的拣配,并取消飞行中的相关指令。 完成负载错误后,重新发行依赖操作。 在解决TLB错误或RAW危险时,将重新起始原始负载,并从拾取队列再次发出依赖操作。