Register access protocol in a multihreaded multi-core processor
    1.
    发明授权
    Register access protocol in a multihreaded multi-core processor 有权
    在多线程多核处理器中注册访问协议

    公开(公告)号:US07747771B1

    公开(公告)日:2010-06-29

    申请号:US10881178

    申请日:2004-06-30

    IPC分类号: G06F15/16 G06F15/76 G06F13/00

    CPC分类号: G06F15/16

    摘要: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.

    摘要翻译: 考虑了用于管理对处理设备中的多个寄存器的访问的方法和机制。 处理设备包括耦合到环形总线的多个节点,每个节点包括一个或多个可由设备内执行的进程访问的寄存器。 还耦合到环形总线的环控制单元被配置为发起针对环形总线上的节点的事务。 每个节点被配置为具有固定延迟的接收和处理总线事务,无论第一个事务是否针对接收节点。 环控制单元被配置为周期性地传送环总线上的空闲事务,以便允许节点响应不确定的事务来访问总线。

    System and method to manage address translation requests
    2.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    System and Method to Manage Address Translation Requests
    3.
    发明申请
    System and Method to Manage Address Translation Requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US20100332787A1

    公开(公告)日:2010-12-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/10 G06F12/00

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS
    4.
    发明申请
    BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS 有权
    用于微处理器的分支机构故障恢复机制

    公开(公告)号:US20100169611A1

    公开(公告)日:2010-07-01

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/312

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    Branch misprediction recovery mechanism for microprocessors
    5.
    发明授权
    Branch misprediction recovery mechanism for microprocessors 有权
    微处理器分支错误预测恢复机制

    公开(公告)号:US08099586B2

    公开(公告)日:2012-01-17

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    System and method for balancing instruction loads between multiple execution units using assignment history
    6.
    发明授权
    System and method for balancing instruction loads between multiple execution units using assignment history 有权
    用于使用分配历史平衡多个执行单元之间的指令加载的系统和方法

    公开(公告)号:US09122487B2

    公开(公告)日:2015-09-01

    申请号:US12490005

    申请日:2009-06-23

    IPC分类号: G06F9/38

    摘要: A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.

    摘要翻译: 公开了一种用于平衡多个执行单元之间的指令负载的系统和方法。 一个或多个执行单元可以由被配置为接受代表执行单元的指令的时隙来表示。 解码单元可以向特定时隙分配指令用于后续调度以执行。 插槽分配可以基于指令的类型和/或先前的时隙分配的历史来进行。 可以在偏置计数器中保持累积时隙分配历史,其偏差反映了先前时隙分配的偏差。 可以基于偏置计数器的值来确定插槽分配,以便平衡所有时隙上的指令负载以及所有执行单元。 偏置计数器可以反映仅在期望的历史窗口内进行的时隙分配。 单独的数据结构可以存储反映在所需历史窗口期间进行的实际时隙分配的数据。

    THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS
    7.
    发明申请
    THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS 有权
    具有多周期运行的多线程处理器的螺纹公差

    公开(公告)号:US20110276783A1

    公开(公告)日:2011-11-10

    申请号:US12773278

    申请日:2010-05-04

    IPC分类号: G06F9/38

    摘要: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.

    摘要翻译: 在多线程处理器中有效执行操作的系统和方法。 每个线程可以包括阻塞指令。 阻塞指令阻止其他线程在相当长的时间内利用硬件资源。 阻塞型指令的一个例子是蒙哥马利乘法加密指令。 每个线程都可以以线程为基础的模式运行,允许在执行阻塞指令期间插入停滞周期,在此期间其他线程可能利用先前阻止的硬件资源。 在多个线程被调度执行阻塞指令的时候,可以改变基于线程的模式,以增加这些多线程的吞吐量。 例如,可以改变该模式以不允许插入失速循环。 因此,可以减少对应于多个线程的阻塞指令的顺序操作的时间。

    Apparatus and method for local operand bypassing for cryptographic instructions
    8.
    发明授权
    Apparatus and method for local operand bypassing for cryptographic instructions 有权
    用于加密指令的本地操作数旁路的装置和方法

    公开(公告)号:US08356185B2

    公开(公告)日:2013-01-15

    申请号:US12575832

    申请日:2009-10-08

    IPC分类号: G06F9/312 G06F21/00

    摘要: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.

    摘要翻译: 处理器可以包括被配置为发出用于执行的指令的硬件指令获取单元和被配置为接收用于执行的指令的硬件功能单元,其中所述指令包括加密指令和非加密指令。 功能单元可以包括被配置为执行具有相应的加密执行等待时间的加密指令的密码执行流水线,以及配置成执行非加密指令的非加密执行流水线,该非加密执行流水线的长度大于 加密执行延迟。 功能单元还可以包括局部旁路网络,其被配置为将由密码执行流水线产生的结果旁路到在密码执行流水线内执行的依赖密码指令,使得依赖密码指令序列内的每个指令都可以用密码执行等待时间执行, 并且其中加密执行流水线的结果不被旁路到处理器内的任何其他功能单元。

    Apparatus and method for fine-grained multithreading in a multipipelined processor core
    9.
    发明授权
    Apparatus and method for fine-grained multithreading in a multipipelined processor core 有权
    多重处理器核心中的细粒度多线程的装置和方法

    公开(公告)号:US07401206B2

    公开(公告)日:2008-07-15

    申请号:US10880488

    申请日:2004-06-30

    IPC分类号: G06F9/34

    摘要: An apparatus and method for fine-grained multithreading in a multipipelined processor core. According to one embodiment, a processor may include instruction fetch logic configured to assign a given one of a plurality of threads to a corresponding one of a plurality of thread groups, where each of the plurality of thread groups may comprise a subset of the plurality of threads, to issue a first instruction from one of the plurality of threads during one execution cycle, and to issue a second instruction from another one of the plurality of threads during a successive execution cycle. The processor may further include a plurality of execution units, each configured to execute instructions issued from a respective thread group.

    摘要翻译: 一种用于多行处理器核心中的细粒度多线程的装置和方法。 根据一个实施例,处理器可以包括指令提取逻辑,其被配置为将多个线程中的给定一个线程分配给多个线程组中的相应一个线程组,其中多个线程组中的每一个可以包括多个线程组的子集 线程,以在一个执行周期期间从多个线程之一发出第一指令,并且在连续执行周期期间从多个线程中的另一个发出第二指令。 处理器还可以包括多个执行单元,每个执行单元被配置为执行从相应的线程组发出的指令。

    APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS
    10.
    发明申请
    APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS 有权
    本地操作的装置和方法用于拼接指令

    公开(公告)号:US20110087895A1

    公开(公告)日:2011-04-14

    申请号:US12575832

    申请日:2009-10-08

    IPC分类号: G06F21/00 G06F9/30 G06F9/312

    摘要: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.

    摘要翻译: 处理器可以包括被配置为发出用于执行的指令的硬件指令获取单元和被配置为接收用于执行的指令的硬件功能单元,其中所述指令包括加密指令和非加密指令。 功能单元可以包括被配置为执行具有相应的加密执行等待时间的加密指令的密码执行流水线,以及配置成执行非加密指令的非加密执行流水线,该非加密执行流水线的长度大于 加密执行延迟。 功能单元还可以包括局部旁路网络,其被配置为将由密码执行流水线产生的结果旁路到在密码执行流水线内执行的依赖密码指令,使得依赖密码指令序列内的每个指令都可以用密码执行等待时间执行, 并且其中加密执行流水线的结果不被旁路到处理器内的任何其他功能单元。