Systems and methods for variable control of power dissipation in a pipelined processor
    1.
    发明授权
    Systems and methods for variable control of power dissipation in a pipelined processor 有权
    流水线处理器功耗可变控制的系统和方法

    公开(公告)号:US06651176B1

    公开(公告)日:2003-11-18

    申请号:US09457169

    申请日:1999-12-08

    IPC分类号: G06F126

    摘要: The invention controls maximum average power dissipation by stalling high power instructions through the pipeline of a pipelined processor. A power dissipation controller stalls the high power instructions in order to control the processor's maximum average power dissipation. Preferably, the controller is modeled after a capacitive system with a constant output rate and a throttled input rate: the output rate represents the steady state maximum average power dissipation; while the input rate is stalled based upon current capacity, representing thermal response time. At start-up, the capacity is initialized. Yet for each high power instruction, the capacity increases by a weighted value. Each clock capacity is also decreased by a variable output rate. In particular, a low power operation is inserted to the stage execution circuit where the stall is desired, creating a low power state for that circuit. This stall effectively creates a “hole” at that pipeline stage, thus temporarily reducing power dissipation. The invention takes advantage of the fact that the presence of an instruction at any stage execution circuit dissipates power and that the absence (i.e., a “hole”) of an instruction at any stage dissipates less power. By controlling where and when a hole occurs within the pipeline, the maximum average power dissipation of the processor is controlled.

    摘要翻译: 本发明通过流水线处理器的流水线停止高功率指令来控制最大平均功耗。 功耗控制器停止高功率指令,以控制处理器的最大平均功耗。 优选地,控制器在具有恒定输出速率和节流输入速率的电容系统之后被建模:输出速率表示稳态最大平均功率耗散; 而输入速率则基于当前容量而停滞,代表热响应时间。 启动时,容量初始化。 然而对于每个大功率指令,容量增加一个加权值。 每个时钟容量也以可变输出速率降低。 特别地,低功率操作被插入到期望失速的级执行电路中,为该电路产生低功率状态。 这个停顿在该流水线阶段有效地创建了一个“孔”,从而暂时降低功耗。 本发明利用了在任何阶段执行电路中存在指令消耗功率并且任何阶段的指令的不存在(即,“孔”)消耗较少功率的事实。 通过控制在管道内发生孔的何处和何时,控制处理器的最大平均功耗。

    Method and system for detecting dropped micro-packets

    公开(公告)号:US07047437B2

    公开(公告)日:2006-05-16

    申请号:US10021170

    申请日:2001-12-12

    IPC分类号: G06F11/00

    CPC分类号: H04L1/1642

    摘要: A system and a method of providing error detection and correction of transmission of multiple flits between sending and receiving agents connected together in a network or computer interconnect environment is disclosed that comprises embedding a sequence identifier in each flit prior to transmission, sending each flit to a connected receiving agent, examining the sequence identifiers of each flit being received and requesting the sending agent to resend a flit if the sequence identifier for that flit is determined to be incorrect.In a preferred embodiment of the present invention, the sequence identifier is embedded in the control portion of the flit and comprises a sequence number that is incremented or otherwise changed in a predictable manner, so that the order of flits being received is predicted. If the sequence number for a flit is different that expected, the receiving agent requests that it be resent.

    Core parallel execution with different optimization characteristics to decrease dynamic execution path

    公开(公告)号:US07028167B2

    公开(公告)日:2006-04-11

    申请号:US10091084

    申请日:2002-03-04

    IPC分类号: G06F11/00

    摘要: The invention provides a processor with two or more parallel instruction paths for processing instructions. The instruction paths may be implemented with a plurality of cores on a common die. Instructions of the invention are preferably processed within a bundle of two or more instructions of a common program thread; and each of the instruction paths preferably forms a cluster to process bundled instructions. Each of the instruction paths has an array of pipelined execution units. Initially, two or more of the parallel instruction paths processes the same program thread (one or more bundles) through the execution units, but with different optimization characteristics set for each path. Assessment logic monitors the processing of the initial program thread through the execution units and selects the heuristics defining which path is in the lead. The other instruction paths are then reallocated, or synchronized, with the optimization characteristics of the lead instruction path, or with similarly optimized characteristics, to process other bundles of the program thread; preferably, the lead path continues processing of the initial thread without being disturbed. For other program threads, the process may repeat in processing like bundles through multiple instruction paths to identify the preferred heuristics; and then synchronizing the multiple instruction paths to the optimized characteristics to improve per thread performance.

    Multithreaded hardware systems and methods
    4.
    发明授权
    Multithreaded hardware systems and methods 失效
    多线程硬件系统和方法

    公开(公告)号:US07600101B2

    公开(公告)日:2009-10-06

    申请号:US11034464

    申请日:2005-01-13

    IPC分类号: G06F9/312

    CPC分类号: G06F9/462

    摘要: Multithreaded hardware systems and methods are disclosed. One embodiment of a system may comprise a multithreaded processor comprising a register file having N hardware threads, where N is an integer greater than or equal to one, and an offline storage structure having M hardware threads, where M is an integer greater than or equal to one. The multithreaded processor system may further comprise a thread control that transfers register values associated with at least one of the N hardware threads to registers of at least one of the M hardware threads and transfers register values of at least of one of the M hardware threads to registers of at least one of the N hardware threads.

    摘要翻译: 公开了多线程硬件系统和方法。 系统的一个实施例可以包括多线程处理器,其包括具有N个硬件线程的寄存器文件,其中N是大于或等于1的整数,以及具有M个硬件线程的离线存储结构,其中M是大于或等于的整数 到一个。 多线程处理器系统还可以包括线程控制,该线程控制将与N个硬件线程中的至少一个相关联的寄存器值传送到M个硬件线程中的至少一个的寄存器,并将至少一个M个硬件线程的寄存器值传送到 N个硬件线程中的至少一个的寄存器。

    Register renaming to reduce bypass and increase apparent physical register size
    5.
    发明授权
    Register renaming to reduce bypass and increase apparent physical register size 失效
    注册重命名以减少旁路并增加明显的物理寄存器大小

    公开(公告)号:US06944751B2

    公开(公告)日:2005-09-13

    申请号:US10074098

    申请日:2002-02-11

    IPC分类号: G06F9/30 G06F9/38 G06F9/312

    摘要: The invention provides a processor architecture that bypasses data hazards. The architecture has an array of pipelines and a register file. Each of the pipelines includes an array of execution units. The register file has a first section of n registers (e.g., 128 registers) and a second section of m registers (e.g., 16 registers). A write mux couples speculative data from the execution units to the second set of m registers and non-speculative data from a write-back stage of the execution units to the first section of n registers. A read mux couples the speculative data from the second set of m registers to the execution units to bypass data hazards within the execution units. The register file preferably includes column decode logic for each of the registers in the second section of m registers to architect speculative data without moving data. The decode logic first decodes, and then selects, an age of the producer of the speculative state; the newest producer enables the decode.

    摘要翻译: 本发明提供了绕过数据危害的处理器架构。 该架构具有一系列管道和一个寄存器文件。 每个管道都包括执行单元的数组。 寄存器文件具有n个寄存器(例如,128个寄存器)的第一部分和m个寄存器的第二部分(例如,16个寄存器)。 写入多路复用器将来自执行单元的推测数据从执行单元的写回阶段到n个寄存器的第一部分将来自执行单元的推测数据耦合到第二组m个寄存器和非推测数据。 读取多路复用器将来自第二组m个寄存器的推测数据耦合到执行单元以绕过执行单元内的数据危险。 寄存器文件优选地包括用于m个寄存器的第二部分中的每个寄存器的列解码逻辑,以构建不移动数据的推测数据。 解码逻辑首先解码,然后选择投机状态的生产者的年龄; 最新的制作商可以进行解码。

    System and method for dynamic processor core and cache partitioning on large-scale multithreaded, multiprocessor integrated circuits
    7.
    发明授权
    System and method for dynamic processor core and cache partitioning on large-scale multithreaded, multiprocessor integrated circuits 有权
    用于大规模多线程,多处理器集成电路的动态处理器核心和缓存分区的系统和方法

    公开(公告)号:US06871264B2

    公开(公告)日:2005-03-22

    申请号:US10092645

    申请日:2002-03-06

    CPC分类号: G06F12/0848 G06F12/0811

    摘要: A processor integrated circuit capable of executing more than one instruction stream has two or more processors. Each processor accesses instructions and data through a cache controller. There are multiple blocks of cache memory. Some blocks of cache memory may optionally be directly attached to particular cache controllers. The cache controllers access at least some of the multiple blocks of cache memory through high speed interconnect, these blocks being dynamically allocable to more than one cache controller. A resource allocation controller determines which cache memory controller has access to the dynamically allocable cache memory block. In an embodiment the cache controllers and cache memory blocks are associated with second level cache, each processor accesses the second level cache controllers upon missing in a first level cache of fixed size.

    摘要翻译: 能够执行多于一条指令流的处理器集成电路具有两个或多个处理器。 每个处理器通过缓存控制器访问指令和数据。 有多个高速缓存区块。 高速缓存存储器的一些块可以可选地直接附加到特定高速缓存控制器。 高速缓存控制器通过高速互连访问高速缓冲存储器的多个块中的至少一些,这些块可动态地分配给多于一个的高速缓存控制器。 资源分配控制器确定哪个高速缓存存储器控制器可以访问动态可分配的高速缓冲存储器块。 在一个实施例中,高速缓存控制器和高速缓冲存储器块与第二级高速缓存相关联,每个处理器在固定大小的第一级高速缓存中丢失时访问第二级高速缓存控制器。

    Local stall/hazard detect in superscalar, pipelined microprocessor
    8.
    发明授权
    Local stall/hazard detect in superscalar, pipelined microprocessor 有权
    超标量,流水线微处理器中的局部失速/危害检测

    公开(公告)号:US06591360B1

    公开(公告)日:2003-07-08

    申请号:US09484138

    申请日:2000-01-18

    IPC分类号: G06F930

    摘要: A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.

    摘要翻译: 一种产生全局失速的简化的局部版本(“局部失速”)以改进流水线微处理器的性能的方法和装置。 当本地消费者检测到数据依赖性危险时,会产生本地摊位。 利用流水线微处理器的数据转发电路中使用的电路,产生局部失速,电路相对较小。 当地的摊位比全球摊位早得多,早在一个地方管道中就快到了。 当地管道利用本地摊位在适当的时候覆盖全局摊位,并确保为当地消费者读取正确的数据,并且比没有本地摊位的标准流水线更有效地运行。

    Using thread urgency in determining switch events in a temporal multithreaded processor unit
    10.
    发明授权
    Using thread urgency in determining switch events in a temporal multithreaded processor unit 失效
    使用线程紧急性来确定时间多线程处理器单元中的切换事件

    公开(公告)号:US07213134B2

    公开(公告)日:2007-05-01

    申请号:US10092670

    申请日:2002-03-06

    IPC分类号: G06F9/48

    CPC分类号: G06F9/3851

    摘要: A processing unit of the invention has multiple instruction pipelines for processing multi-threaded instructions. Each thread may have an urgency associated with its program instructions. The processing unit has a thread switch controller to monitor processing of instructions through the various pipelines. The thread controller also controls switch events to move from one thread to another within the pipelines. The controller may modify the urgency of any thread such as by issuing an additional instruction. The thread controller preferably utilizes certain heuristics in making switch event decisions. A time slice expiration unit may also monitor expiration of threads for a given time slice.

    摘要翻译: 本发明的处理单元具有用于处理多线程指令的多条指令流水线。 每个线程可能具有与其程序指令相关联的紧急性。 处理单元具有线程开关控制器,用于监视通过各种管线的指令处理。 线程控制器还控制开关事件在管道内从一个线程移动到另一个线程。 控制器可以通过发出附加指令来修改任何线程的紧急性。 螺纹控制器优选地在确定开关事件决定时利用某些启发式。 时间片过期单元还可以监视给定时间片的线程的到期时间。