Method and system for recoding noneffective instructions within a data
processing system
    61.
    发明授权
    Method and system for recoding noneffective instructions within a data processing system 失效
    在数据处理系统内重新编码无效指令的方法和系统

    公开(公告)号:US5619408A

    公开(公告)日:1997-04-08

    申请号:US387145

    申请日:1995-02-10

    IPC分类号: G06F9/30 G06F9/318 G05B15/00

    CPC分类号: G06F9/3017 G06F9/30145

    摘要: A method and system are disclosed for processing instructions within a data processing system including a processor having a plurality of execution units. According to the method of the present invention, a number of instructions stored within a memory within the data processing system are retrieved from memory. A selected instruction among the number of instructions is decoded to determine if the selected instruction would be noneffective if executed by the processor. In a preferred embodiment of the present invention, noneffective instructions include instructions with invalid opcodes and instructions that would not change the value of any data register within the processor. In response to determining that the selected instruction would be noneffective if executed by the processor, the selected instruction is recoded into a specified instruction format prior to dispatching the selected instruction to one of the number of execution units. Detecting noneffective instructions prior to dispatch reduces the decode logic required within the dispatcher and enhances processor performance.

    摘要翻译: 公开了一种用于处理包括具有多个执行单元的处理器的数据处理系统内的指令的方法和系统。 根据本发明的方法,从存储器中检索存储在数据处理系统内的存储器内的多个指令。 解码指令数目中的选择指令,以确定所选择的指令是否由处理器执行时是无效的。 在本发明的优选实施例中,无效指令包括具有无效操作码的指令和不会改变处理器内的任何数据寄存器的值的指令。 响应于确定所选择的指令如果由处理器执行将是无效的,则在将所选择的指令分派到多个执行单元之一之前,所选择的指令被重新编码为指定的指令格式。 在调度之前检测无效指令可减少调度程序中所需的解码逻辑,并提高处理器的性能。

    System and method for out-of-order resource allocation and deallocation in a threaded machine

    公开(公告)号:US09690625B2

    公开(公告)日:2017-06-27

    申请号:US12485608

    申请日:2009-06-16

    申请人: Robert T. Golla

    发明人: Robert T. Golla

    IPC分类号: G06F9/46 G06F9/50

    CPC分类号: G06F9/5011 G06F2209/507

    摘要: A system and method for managing the dynamic sharing of processor resources between threads in a multi-threaded processor are disclosed. Out-of-order allocation and deallocation may be employed to efficiently use the various resources of the processor. Each element of an allocate vector may indicate whether a corresponding resource is available for allocation. A search of the allocate vector may be performed to identify resources available for allocation. Upon allocation of a resource, a thread identifier associated with the thread to which the resource is allocated may be associated with the allocate vector entry corresponding to the allocated resource. Multiple instances of a particular resource type may be allocated or deallocated in a single processor execution cycle. Each element of a deallocate vector may indicate whether a corresponding resource is ready for deallocation. Examples of resources that may be dynamically shared between threads are reorder buffers, load buffers and store buffers.

    Dependency matrix for the determination of load dependencies
    63.
    发明授权
    Dependency matrix for the determination of load dependencies 有权
    用于确定负载依赖性的依赖矩阵

    公开(公告)号:US09262171B2

    公开(公告)日:2016-02-16

    申请号:US12495025

    申请日:2009-06-30

    摘要: Systems and methods for identification of dependent instructions on speculative load operations in a processor. A processor allocates entries of a unified pick queue for decoded and renamed instructions. Each entry of a corresponding dependency matrix is configured to store a dependency bit for each other instruction in the pick queue. The processor speculates that loads will hit in the data cache, hit in the TLB and not have a read after write (RAW) hazard. For each unresolved load, the pick queue tracks dependent instructions via dependency vectors based upon the dependency matrix. If a load speculation is found to be incorrect, dependent instructions in the pick queue are reset to allow for subsequent picking, and dependent instructions in flight are canceled. On completion of a load miss, dependent operations are re-issued. On resolution of a TLB miss or RAW hazard, the original load is replayed and dependent operations are issued again from the pick queue.

    摘要翻译: 用于识别处理器中推测加载操作的依赖指令的系统和方法。 处理器为解码和重新命名的指令分配统一挑选队列的条目。 相应的依赖矩阵的每个条目被配置为在拾取队列中存储每个其他指令的依赖位。 处理器推测负载将在数据高速缓存中击中,在TLB中触发,写入(RAW)危险后不会有读取。 对于每个未解决的负载,拾取队列基于依赖矩阵通过依赖向量跟踪相关指令。 如果发现负载推测不正确,则选择队列中的相关指令将被重置,以允许随后的拣配,并取消飞行中的相关指令。 完成负载错误后,重新发行依赖操作。 在解决TLB错误或RAW危险时,将重新起始原始负载,并从拾取队列再次发出依赖操作。

    Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor
    64.
    发明授权
    Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor 有权
    基于感知器的分支预测机制,用于在多线程处理器上预测条件分支指令

    公开(公告)号:US08904156B2

    公开(公告)日:2014-12-02

    申请号:US12578859

    申请日:2009-10-14

    IPC分类号: G06F9/38

    摘要: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.

    摘要翻译: 多线程微处理器包括指令提取单元,其包括基于感知器的条件分支预测单元,被配置为针对一个或多个并行执行的线程中的每一个为方向分支预测提供。 条件分支预测单元包括多个存储器,每个存储器包括多个条目。 每个条目可被配置为存储一个或多个预测值。 给定存储器的每个预测值可以对应于高速缓存行中的至少一个条件转移指令。 条件分支预测单元可以通过生成用于访问第一存储器的第一索引值来生成用于访问每个存储器的单独索引值,该第一索引值通过组合接收到的指令获取地址的一个或多个部分,并且生成彼此用于访问其他存储器的索引值 通过将第一索引值与方向分支历史信息的不同部分组合。

    Mechanism for selecting instructions for execution in a multithreaded processor
    65.
    发明授权
    Mechanism for selecting instructions for execution in a multithreaded processor 有权
    在多线程处理器中选择执行指令的机制

    公开(公告)号:US08769246B2

    公开(公告)日:2014-07-01

    申请号:US13027056

    申请日:2011-02-14

    申请人: Robert T. Golla

    发明人: Robert T. Golla

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3851 G06F9/3861

    摘要: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.

    摘要翻译: 在一个实施例中,多线程处理器包括多个缓冲器,每个缓冲器被配置为存储对应于相应线程的指令。 多线程处理器还包括耦合到多个缓冲器的拾取单元。 拾取单元可以在给定周期中从至少一个缓冲器中选择基于线程选择算法的有效指令。 拾取单元可以在给定的周期中进一步取消响应于接收到取消指示而选择有效指令。

    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER
    66.
    发明申请
    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER 审中-公开
    使用一般负载/存储超时计数器在螺纹加工器上减少螺纹头

    公开(公告)号:US20130297910A1

    公开(公告)日:2013-11-07

    申请号:US13463319

    申请日:2012-05-03

    IPC分类号: G06F9/30 G06F9/38

    摘要: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced.

    摘要翻译: 在具有动态资源分配的线程处理器中有效的线程仲裁的系统和方法。 处理器包括由多个线程共享的资源。 资源包括可以分配给任何线程使用的条目。 控制逻辑检测长延迟指令。 长延迟指令的延迟大于给定的阈值。 一个示例是对于丢失最后一级数据高速缓存的存储指令具有对后读写(RAW)数据依赖性的加载指令。 选择长延迟指令或立即更年轻的指令用于相关线程的重放。 相关线程的流水线冲洗和重播将以所选指令开始。 比长延迟指令更年轻的指令保持在给定的流水线阶段,直到长延迟指令完成。 在重放期间,这种保持可以防止资源被分配给相关联的线程,而长时间延迟指令被服务。

    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR
    67.
    发明申请
    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR 有权
    螺纹加工机上的螺纹减速器

    公开(公告)号:US20130290675A1

    公开(公告)日:2013-10-31

    申请号:US13457055

    申请日:2012-04-26

    IPC分类号: G06F9/312 G06F9/30

    摘要: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.

    摘要翻译: 在具有动态资源分配的线程处理器中有效的线程仲裁的系统和方法。 处理器包括由多个线程共享的资源。 资源包括一个具有多个条目的数组,每个条目可以分配给任何线程使用。 控制逻辑检测到对存储器的负载缺失,其中该缺失与大于给定阈值的等待时间相关联。 为相关联的线程重新选择加载指令或一个立即更年轻的指令。 相关线程的流水线冲洗和重播将以所选指令开始。 比加载指令更小的指令在给定的流水线级保持,直到加载指令完成。 在重放期间,此保持可以防止资源被分配给相关联的线程,同时处理加载指令。

    Dynamic tag allocation in a multithreaded out-of-order processor
    68.
    发明授权
    Dynamic tag allocation in a multithreaded out-of-order processor 有权
    多线程无序处理器中的动态标签分配

    公开(公告)号:US08429386B2

    公开(公告)日:2013-04-23

    申请号:US12494532

    申请日:2009-06-30

    IPC分类号: G06F15/00 G06F9/30 G06F9/40

    摘要: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.

    摘要翻译: 公开了用于动态分配指令标签和使用这些标签的各种技术。 这些技术可能适用于支持无序执行的处理器和支持多线程的体系结构。 可以从可用标签值池中分配一组指令。 标签值可用于确定相对于线程中的其他指令的一组指令的程序顺序。 在指示组(或将要))提交之后,可以释放标签值,以便可以在第二组指令上重新使用。 标记值在线程之间动态分配; 因此,特定标签值或标签值的范围不专用于特定线程。

    Dynamic allocation of resources in a threaded, heterogeneous processor
    69.
    发明授权
    Dynamic allocation of resources in a threaded, heterogeneous processor 有权
    在线程异构处理器中动态分配资源

    公开(公告)号:US08335911B2

    公开(公告)日:2012-12-18

    申请号:US12570642

    申请日:2009-09-30

    IPC分类号: G06F9/00

    摘要: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.

    摘要翻译: 处理器中共享资源的有效动态利用的系统和方法。 处理器包括前端流水线,执行流水线和提交流水线,其中每个流水线包括具有被配置为被分配供在处理器支持的多个线程中的每一个的每个时钟周期中的条目的共享资源。 为了避免任何活动线程的饥饿,处理器还包括被配置为确保每个活动线程能够分配至少预定的每个共享资源的条目配额的电路。 用于处理器的总流水线的每个管道级可以包括被配置为不使任何活动线程饿死的至少一个动态分配的共享资源。 多个线程之间的共享资源的动态分配可以产生比静态分配更高的性能。 此外,动态分配可能需要相对较少的开销用于线程的激活/去激活。

    PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR
    70.
    发明申请
    PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR 有权
    基于PERCEPTRON的分支预测机制,用于预测多处理器上的条件分支指令

    公开(公告)号:US20110087866A1

    公开(公告)日:2011-04-14

    申请号:US12578859

    申请日:2009-10-14

    IPC分类号: G06F9/38

    摘要: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.

    摘要翻译: 多线程微处理器包括指令提取单元,其包括基于感知器的条件分支预测单元,被配置为针对一个或多个同时执行的线程中的每一个提供方向分支预测。 条件分支预测单元包括多个存储器,每个存储器包括多个条目。 每个条目可被配置为存储一个或多个预测值。 给定存储器的每个预测值可以对应于高速缓存行中的至少一个条件转移指令。 条件分支预测单元可以通过生成用于访问第一存储器的第一索引值来生成用于访问每个存储器的单独索引值,该第一索引值通过组合接收到的指令获取地址的一个或多个部分,并且生成彼此用于访问其他存储器的索引值 通过将第一索引值与方向分支历史信息的不同部分组合。