Hybrid branch prediction device with sparse and dense prediction caches
    1.
    发明授权
    Hybrid branch prediction device with sparse and dense prediction caches 有权
    具有稀疏密集预测缓存的混合分支预测装置

    公开(公告)号:US08181005B2

    公开(公告)日:2012-05-15

    申请号:US12205429

    申请日:2008-09-05

    IPC分类号: G06F9/32 G06F9/38

    CPC分类号: G06F9/3844 G06F9/3806

    摘要: A system and method for branch prediction in a microprocessor. A hybrid device stores branch prediction information in a sparse cache for no more than a common smaller number of branches within each entry of the instruction cache. For the less common case wherein an i-cache line comprises additional branches, the device stores the corresponding branch prediction information in a dense cache. Each entry of the sparse cache stores a bit vector indicating whether or not a corresponding instruction cache line includes additional branch instructions. This indication may also be used to select an entry in the dense cache for storage. A second sparse cache stores entire evicted entries from the first sparse cache.

    摘要翻译: 一种用于微处理器中分支预测的系统和方法。 混合设备将稀疏高速缓存中的分支预测信息存储在指令高速缓存的每个条目内不超过公共较小数量的分支。 对于i-cache行包括附加分支的较不常见的情况,该设备将相应的分支预测信息存储在密集高速缓存中。 稀疏高速缓存的每个条目存储指示对应的指令高速缓存行是否包括附加分支指令的位向量。 此指示也可用于选择密集缓存中的条目以进行存储。 第二个稀疏缓存存储从第一个稀疏高速缓存中的所有被驱逐的条目。

    CLASSIFYING AND SEGREGATING BRANCH TARGETS
    2.
    发明申请
    CLASSIFYING AND SEGREGATING BRANCH TARGETS 审中-公开
    分类和分散分支目标

    公开(公告)号:US20110093658A1

    公开(公告)日:2011-04-21

    申请号:US12581878

    申请日:2009-10-19

    IPC分类号: G06F9/38 G06F12/08

    CPC分类号: G06F9/3844 G06F9/3806

    摘要: A system and method for branch prediction in a microprocessor. A branch prediction unit stores an indication of a location of a branch target instruction relative to its corresponding branch instruction. For example, a target instruction may be located within a first region of memory as a branch instruction. Alternatively, the target instruction may be located outside the first region, but within a larger second region. The prediction unit comprises a branch target array corresponding to each region. Each array stores a bit range of a branch target address, wherein the stored bit range is based upon the location of the target instruction relative to the branch instruction. The prediction unit constructs a predicted branch target address by concatenating a bits stored in the branch target arrays.

    摘要翻译: 一种用于微处理器中分支预测的系统和方法。 分支预测单元相对于其相应的分支指令存储分支目标指令的位置的指示。 例如,目标指令可以作为分支指令位于存储器的第一区域内。 或者,目标指令可以位于第一区域的外部,但在较大的第二区域内。 预测单元包括对应于每个区域的分支目标阵列。 每个阵列存储分支目标地址的比特范围,其中存储的比特范围基于目标指令相对于分支指令的位置。 预测单元通过连接存储在分支目标数组中的比特来构建预测分支目标地址。

    HYBRID BRANCH PREDICTION DEVICE WITH SPARSE AND DENSE PREDICTION CACHES
    3.
    发明申请
    HYBRID BRANCH PREDICTION DEVICE WITH SPARSE AND DENSE PREDICTION CACHES 有权
    混合分支预测装置,具有稀疏和深度预测速度

    公开(公告)号:US20100064123A1

    公开(公告)日:2010-03-11

    申请号:US12205429

    申请日:2008-09-05

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3844 G06F9/3806

    摘要: A system and method for branch prediction in a microprocessor. A hybrid device stores branch prediction information in a sparse cache for no more than a common smaller number of branches within each entry of the instruction cache. For the less common case wherein an i-cache line comprises additional branches, the device stores the corresponding branch prediction information in a dense cache. Each entry of the sparse cache stores a bit vector indicating whether or not a corresponding instruction cache line includes additional branch instructions. This indication may also be used to select an entry in the dense cache for storage. A second sparse cache stores entire evicted entries from the first sparse cache.

    摘要翻译: 一种用于微处理器中分支预测的系统和方法。 混合设备将稀疏高速缓存中的分支预测信息存储在指令高速缓存的每个条目内不超过公共较小数量的分支。 对于i-cache行包括附加分支的较不常见的情况,该设备将相应的分支预测信息存储在密集高速缓存中。 稀疏高速缓存的每个条目存储指示对应的指令高速缓存行是否包括附加分支指令的位向量。 此指示也可用于选择密集缓存中的条目以进行存储。 第二个稀疏缓存存储从第一个稀疏高速缓存中的所有被驱逐的条目。

    REPLAY OF DETECTED PATTERNS IN PREDICTED INSTRUCTIONS
    4.
    发明申请
    REPLAY OF DETECTED PATTERNS IN PREDICTED INSTRUCTIONS 有权
    检测图案在预测指示中的重置

    公开(公告)号:US20120117362A1

    公开(公告)日:2012-05-10

    申请号:US12943859

    申请日:2010-11-10

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3848 G06F9/381

    摘要: Techniques are disclosed relating to improving the performance of branch prediction in processors. In one embodiment, a processor is disclosed that includes a branch prediction unit configured to predict a sequence of instructions to be issued by the processor for execution. The processor also includes a pattern detection unit configured to detect a pattern in the predicted sequence of instructions, where the pattern includes a plurality of predicted instructions. In response to the pattern detection unit detecting the pattern, the processor is configured to switch from issuing instructions predicted by the branch prediction unit to issuing the plurality of instructions. In some embodiments, the processor includes a replay unit that is configured to replay fetch addresses to an instruction fetch unit to cause the plurality of predicted instructions to be issued.

    摘要翻译: 公开了关于改善处理器中分支预测的性能的技术。 在一个实施例中,公开了一种处理器,其包括分支预测单元,其被配置为预测要由处理器发出的用于执行的指令序列。 所述处理器还包括:图案检测单元,被配置为检测所述预测指令序列中的图案,其中所述图案包括多个预测指令。 响应于图案检测单元检测图案,处理器被配置为从由分支预测单元预测的发布指令切换到发出多个指令。 在一些实施例中,处理器包括重播单元,其被配置为将取指地址重播到指令提取单元以使得发出多个预测指令。

    Bypass circuitry for use in a pipelined processor

    公开(公告)号:US07093107B2

    公开(公告)日:2006-08-15

    申请号:US09751377

    申请日:2000-12-29

    申请人: Anthony X. Jarvis

    发明人: Anthony X. Jarvis

    IPC分类号: G06F15/00

    摘要: There is disclosed a data processor that uses bypass circuitry to transfer result data from late pipeline stages to earlier pipeline stages in an efficient manner and with a minimum amount of wiring. The data processor comprises: 1) an instruction execution pipeline comprising a) a read stage; b) a write stage; and c) a first execution stage comprising E execution units that produce data results from data operands. The data processor also comprises: 2) a register file comprising a plurality of data registers, each of the data registers being read by the read stage of the instruction pipeline via at least one of R read ports of the register file and each of the data registers being written by the write stage of the instruction pipeline via at least one of W write ports of the register file; and 3) bypass circuitry for receiving data results from output channels of source devices in at least one of the write stage and the first execution stage, the bypass circuitry comprising a first plurality of bypass tristate line drivers having input channels coupled to first output channels of a first plurality of source devices and tristate output channels coupled to a first common read data channel in the read stage.

    System and method for supporting precise exceptions in a data processor having a clustered architecture
    6.
    发明授权
    System and method for supporting precise exceptions in a data processor having a clustered architecture 有权
    用于在具有集群架构的数据处理器中支持精确异常的系统和方法

    公开(公告)号:US06807628B2

    公开(公告)日:2004-10-19

    申请号:US09751330

    申请日:2000-12-29

    IPC分类号: G06F938

    摘要: There is disclosed a data processor having a clustered architecture that comprises a plurality of clusters and an interrupt and exception controller. Each of the clusters comprises an instruction execution pipeline having N processing stages. Each of the N processing stages is capable of performing at least one of a plurality of execution steps associated with instructions being executed by the clusters. The interrupt and exception controller operates to (i) detect an exception condition associated with one of the executing instructions, wherein this executing instruction issued at time t0, and (ii) generate an exception in response to the exception condition upon completed execution of earlier ones of the executing instructions, these earlier executing instructions issued at time preceding t0.

    摘要翻译: 公开了一种数据处理器,其具有包括多个集群和中断和异常控制器的集群架构。 每个簇包括具有N个处理阶段的指令执行流水线。 N个处理阶段中的每一个能够执行与由集群执行的指令相关联的多个执行步骤中的至少一个。 中断和异常控制器操作以(i)检测与执行指令之一相关联的异常条件,其中在时间t0发出的执行指令,和(ii)在完成执行之后根据异常条件产生异常 执行指令的这些早期执行指令在t0之前发出。

    Detecting branch direction and target address pattern and supplying fetch address by replay unit instead of branch prediction unit
    7.
    发明授权
    Detecting branch direction and target address pattern and supplying fetch address by replay unit instead of branch prediction unit 有权
    检测分支方向和目标地址模式,并通过重播单元而不是分支预测单元提供提取地址

    公开(公告)号:US08667257B2

    公开(公告)日:2014-03-04

    申请号:US12943859

    申请日:2010-11-10

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3848 G06F9/381

    摘要: Techniques are disclosed relating to improving the performance of branch prediction in processors. In one embodiment, a processor is disclosed that includes a branch prediction unit configured to predict a sequence of instructions to be issued by the processor for execution. The processor also includes a pattern detection unit configured to detect a pattern in the predicted sequence of instructions, where the pattern includes a plurality of predicted instructions. In response to the pattern detection unit detecting the pattern, the processor is configured to switch from issuing instructions predicted by the branch prediction unit to issuing the plurality of instructions. In some embodiments, the processor includes a replay unit that is configured to replay fetch addresses to an instruction fetch unit to cause the plurality of predicted instructions to be issued.

    摘要翻译: 公开了关于改善处理器中分支预测的性能的技术。 在一个实施例中,公开了一种处理器,其包括分支预测单元,其被配置为预测要由处理器发出的用于执行的指令序列。 所述处理器还包括:图案检测单元,被配置为检测所述预测指令序列中的图案,其中所述图案包括多个预测指令。 响应于图案检测单元检测图案,处理器被配置为从由分支预测单元预测的发布指令切换到发出多个指令。 在一些实施例中,处理器包括重播单元,其被配置为将取指地址重播到指令提取单元以使得发出多个预测指令。

    System and method for executing variable latency load operations in a date processor
    8.
    发明授权
    System and method for executing variable latency load operations in a date processor 有权
    在日期处理器中执行可变延迟加载操作的系统和方法

    公开(公告)号:US07757066B2

    公开(公告)日:2010-07-13

    申请号:US09751372

    申请日:2000-12-29

    IPC分类号: G06F9/30

    摘要: There is disclosed a data processor that executes variable latency load operations using bypass circuitry that allows load word operations to avoid stalls caused by shifting circuitry. The processor comprises: 1) an instruction execution pipeline comprising N processing stages, each of the N processing stages for performing one of a plurality of execution steps associated with a pending instruction being executed by the instruction execution pipeline; 2) a data cache for storing data values used by the pending instruction; 3) a plurality of registers for receiving the data values from the data cache; 4) a load store unit for transferring a first one of the data values from the data cache to a target one of the plurality of registers during execution of a load operation; 5) a shifter circuit associated with the load store unit for shifting the first data value prior to loading the first data value into the target register; and 6) bypass circuitry associated with the load store unit for transferring the first data value from the data cache directly to the target register without processing the first data value in the shifter circuit.

    摘要翻译: 公开了一种使用旁路电路执行可变等待时间负载操作的数据处理器,其允许加载字操作以避免由移位电路引起的停顿。 所述处理器包括:1)包括N个处理级的指令执行流水线,所述N个处理级中的每一个执行与由所述指令执行管线执行的待决指令相关联的多个执行步骤之一; 2)用于存储待决指令使用的数据值的数据高速缓存; 3)用于从数据高速缓存接收数据值的多个寄存器; 4)一种加载存储单元,用于在执行加载操作期间将数据值中的第一个数据值从数据高速缓存传送到多个寄存器中的目标寄存器; 5)与加载存储单元相关联的移位器电路,用于在将第一数据值加载到目标寄存器之前移位第一数据值; 和6)旁路与加载存储单元相关联的电路,用于将第一数据值从数据高速缓存直接传送到目标寄存器,而不处理移位器电路中的第一数据值。

    Circuit and method for instruction compression and dispersal in wide-issue processors
    9.
    发明授权
    Circuit and method for instruction compression and dispersal in wide-issue processors 有权
    广泛处理器中指令压缩和扩散的电路和方法

    公开(公告)号:US07143268B2

    公开(公告)日:2006-11-28

    申请号:US09751674

    申请日:2000-12-29

    IPC分类号: G06F9/30

    摘要: A data processor includes execution clusters, an instruction cache, an instruction issue unit, and alignment and dispersal circuitry. Each execution cluster includes an instruction execution pipeline having a number of processing stages, and each execution pipeline is a number of lanes wide. The processing stages execute instruction bundles, where each instruction bundle has one or more syllables. Each lane is capable of receiving one of the syllables of an instruction bundle. The instruction cache includes a number of cache lines. The instruction issue unit receives fetched cache lines and issues complete instruction bundles toward the execution clusters. The alignment and dispersal circuitry receives the complete instruction bundles from the instruction issue unit and routes each received complete instruction bundle to a correct one of the execution clusters. The complete instruction bundles are routed as a function of at least one address bit associated with each complete instruction bundle.

    摘要翻译: 数据处理器包括执行集群,指令高速缓存,指令发布单元以及对准和分散电路。 每个执行集群包括具有多个处理阶段的指令执行流水线,并且每个执行流水线都是多个通道。 处理阶段执行指令束,其中每个指令束具有一个或多个音节。 每个通道能够接收指令束的一个音节。 指令高速缓存包含多条缓存行。 指令发布单元接收获取的高速缓存行并向执行群发出完整的指令束。 对齐和分散电路从指令发布单元接收完整的指令束,并将每个接收的完整指令包路由到正确的一个执行簇。 完整的指令束作为与每个完整指令束相关联的至少一个地址位的功能进行路由。

    Instruction fetch apparatus for wide issue processors and method of operation

    公开(公告)号:US07028164B2

    公开(公告)日:2006-04-11

    申请号:US09751679

    申请日:2000-12-29

    IPC分类号: G06F9/40

    CPC分类号: G06F9/3816 G06F9/30149

    摘要: There is disclosed a data processor containing an instruction issue unit that efficiently transfers instruction bundles from a cache to an instruction pipeline. The data processor comprises 1) an instruction pipeline comprising N processing stages; and 2) an instruction issue unit for fetching into the instruction pipeline instructions fetched from the instruction cache, each of the fetched instructions comprising from one to S syllables. The instruction issue unit comprises: a) a first buffer comprising S storage locations for storing up to S syllables associated with the fetched instructions, each of the S storage locations storing one of the one to S syllables of each fetched instruction; b) a second buffer comprising S storage locations for storing up to S syllables associated with the fetched instructions, each of the S storage locations for storing one of the one to S syllables of each fetched instruction; and c) a controller for determining if a first one of the S storage locations in the first buffer is full, wherein the controller, in response to such a determination, stores a corresponding syllable in an incoming fetched instruction in one of the S storage locations in the second buffer.