Apparatus and method for performing branch predictions using dual branch history tables and for updating such branch history tables
    51.
    发明授权
    Apparatus and method for performing branch predictions using dual branch history tables and for updating such branch history tables 失效
    用于使用双分支历史表执行分支预测并用于更新这种分支历史表的装置和方法

    公开(公告)号:US06823446B1

    公开(公告)日:2004-11-23

    申请号:US09549154

    申请日:2000-04-13

    申请人: Balaram Sinharoy

    发明人: Balaram Sinharoy

    IPC分类号: G06F900

    CPC分类号: G06F9/3848

    摘要: A branch prediction method includes the step of retrieving prediction values from a local branch history table and a global branch history table. A branch prediction operation is selectively performed using the value retrieved from the local branch history table when the value from the local branch history table falls within first predicted limits. A branch prediction operation is selectively performed using the value retrieved from the global branch history table when the value from the global branch history falls within a second predetermined limit.

    摘要翻译: 分支预测方法包括从本地分支历史表和全局分支历史表检索预测值的步骤。 当来自本地分支历史表的值落在第一预定限度内时,使用从本地分支历史表检索到的值来选择性地执行分支预测操作。 当来自全球分支历史的值落在第二预定极限内时,使用从全局分支历史表检索到的值来选择性地执行分支预测操作。

    Prefetching instructions in mis-predicted path for low confidence branches
    52.
    发明授权
    Prefetching instructions in mis-predicted path for low confidence branches 有权
    预读路径中的低信号分支预取指令

    公开(公告)号:US06766441B2

    公开(公告)日:2004-07-20

    申请号:US09765163

    申请日:2001-01-19

    申请人: Balaram Sinharoy

    发明人: Balaram Sinharoy

    IPC分类号: G06F938

    摘要: In a first aspect of the present invention, a method for prefetching instructions in a superscalar processor is disclosed. The method comprises the steps of fetching a set of instructions along a predicted path and prefetching a predetermined number of instructions if a low confidence branch is fetched and storing the predetermined number of instructions in a prefetch buffer. In a second aspect of the present invention, a system for prefetching instructions in a superscalar processor is disclosed. The system comprises a cache for fetching a set of instructions along a predicted path, a prefetching mechanism coupled to the cache for prefetching a predetermined number of instructions if a low confidence branch is fetched and a prefetch buffer coupled to the prefetching mechanism for storing the predetermined number of instructions. Through the use of the method and system in accordance with the present invention, existing prefetching algorithms are improved with minimal additional hardware cost.

    摘要翻译: 在本发明的第一方面,公开了一种用于在超标量处理器中预取指令的方法。 该方法包括以下步骤:如果获取低置信度分支并将预定数量的指令存储在预取缓冲器中,则沿着预测路径获取一组指令并预取预定数量的指令。 在本发明的第二方面中,公开了一种用于在超标量处理器中预取指令的系统。 该系统包括用于沿着预测路径获取一组指令的高速缓存器,耦合到高速缓冲存储器的预取机构,用于如果获取低置信度分支则预取数量的指令,以及耦合到预取机构的预取缓冲器,用于存储预定的 指令数量 通过使用根据本发明的方法和系统,现有的预取算法被改进,同时最小的附加硬件成本。

    Processor and method for partially flushing a dispatched instruction group including a mispredicted branch
    53.
    发明授权
    Processor and method for partially flushing a dispatched instruction group including a mispredicted branch 有权
    用于部分刷新分派指令组的处理器和方法,包括错误预测的分支

    公开(公告)号:US09489207B2

    公开(公告)日:2016-11-08

    申请号:US12423495

    申请日:2009-04-14

    IPC分类号: G06F9/30 G06F9/38

    摘要: Mechanisms are provided for partial flush handling with multiple branches per instruction group. The instruction fetch unit sorts instructions into groups. A group may include a floating branch instruction and a boundary branch instruction. For each group of instructions, the instruction sequencing unit creates an entry in a global completion table (GCT), which may also be referred to herein as a group completion table. The instruction sequencing unit uses the GCT to manage completion of instructions within each outstanding group. Because each group may include up to two branches, the instruction sequencing unit may dispatch instructions beyond the first branch, i.e. the floating branch. Therefore, if the floating branch results in a misprediction, the processor performs a partial flush of that group, as well as a flush of every group younger than that group.

    摘要翻译: 提供了用于部分刷新处理的机制,每个指令组具有多个分支。 指令提取单元将指令分组分组。 组可以包括浮动分支指令和边界分支指令。 对于每组指令,指令排序单元在全局完成表(GCT)中创建条目,其也可以在此被称为组完成表。 指令排序单元使用GCT来管理每个优秀组内的指令完成。 因为每个组可以包括多达两个分支,所以指令排序单元可以分派指令超出第一分支,即浮动分支。 因此,如果浮动分支导致错误预测,则处理器将对该组进行部分刷新,以及每个小于该组的组的刷新。

    Block driven computation using a caching policy specified in an operand data structure
    55.
    发明授权
    Block driven computation using a caching policy specified in an operand data structure 有权
    使用在操作数数据结构中指定的缓存策略进行块驱动计算

    公开(公告)号:US08458439B2

    公开(公告)日:2013-06-04

    申请号:US12336350

    申请日:2008-12-16

    IPC分类号: G06F12/00

    CPC分类号: G06F9/383 G06F2212/6028

    摘要: A processor has an associated memory hierarchy including a cache memory. The processor includes an instruction sequencing unit that fetches instructions for processing, an operand data structure including a plurality of entries corresponding to operands of operations to be performed by the processor, and a computation engine. A first entry among the plurality of entries in the operand data structure specifies a first caching policy for a first operand, and a second entry specifies a second caching policy for a second operand. The computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure.

    摘要翻译: 处理器具有包括高速缓冲存储器的相关联的存储器层级。 所述处理器包括:指令排序单元,其提取用于处理的指令;操作数数据结构,包括与由所述处理器执行的操作操作数对应的多个条目;以及计算引擎。 操作数数据结构中的多个条目中的第一条目指定第一操作数的第一高速缓存策略,第二条目指定用于第二操作数的第二高速缓存策略。 计算引擎根据操作数数据结构中指示的缓存策略计算并存储存储器层次结构中的操作数。

    Thread priority method for ensuring processing fairness in simultaneous multi-threading microprocessors
    56.
    发明授权
    Thread priority method for ensuring processing fairness in simultaneous multi-threading microprocessors 失效
    线程优先级方法,用于确保同时多线程微处理器的处理公平性

    公开(公告)号:US08418180B2

    公开(公告)日:2013-04-09

    申请号:US12129876

    申请日:2008-05-30

    IPC分类号: G06F9/46

    摘要: A method, apparatus, and computer program product are disclosed for ensuring processing fairness in simultaneous multi-threading (SMT) microprocessors. A clock cycle priority is assigned to a first thread and to a second thread during a standard selection state that lasts for an expected number of clock cycles by selecting the first thread to be a primary thread and the second thread to be a secondary thread. If a condition exists that requires overriding, an override state is executed by selecting the second thread to be the primary thread and the first thread to be the secondary thread. The override state is forced to be executed for an override period of time which equals the expected number of clock cycles plus a forced number of clock cycles. The forced number of clock cycles is granted to the first thread in response to the first thread again becoming the primary thread.

    摘要翻译: 公开了一种用于确保同时多线程(SMT)微处理器中的处理公平性的方法,装置和计算机程序产品。 在通过选择作为主线程的第一线程和第二线程成为辅线程的持续期望的时钟周期数的标准选择状态期间,将时钟周期优先级分配给第一线程和第二线程。 如果存在需要覆盖的条件,则通过选择作为主线程的第二个线程和第一个线程作为辅助线程来执行覆盖状态。 超时状态被强制执行超时时间等于预期的时钟周期数加上强制的时钟周期数。 响应于第一个线程再次成为主线程,强制的时钟周期数被授予第一个线程。

    Cache management during asynchronous memory move operations
    57.
    发明授权
    Cache management during asynchronous memory move operations 有权
    异步存储器移动操作期间的缓存管理

    公开(公告)号:US08327101B2

    公开(公告)日:2012-12-04

    申请号:US12024526

    申请日:2008-02-01

    IPC分类号: G06F12/02

    摘要: A data processing system includes a mechanism for completing an asynchronous memory move (AMM) operation in which the processor receives an AMM ST instruction and processes a processor-level move of data in virtual address space and an asynchronous memory mover then completes a physical move of the data within the real address space (memory). A status/control field of the AMM ST instruction includes an indication of a requested treatment of the lower level cache(s) on completion of the AMM operation. When the status/control field indicates an update to at least one cache should be performed, the asynchronous memory mover automatically forwards a copy of the data from the data move to the lower level cache, and triggers an update of a coherency state for a cache line in which the copy of the data is placed.

    摘要翻译: 数据处理系统包括用于完成异步存储器移动(AMM)操作的机制,其中处理器接收AMM ST指令并处理虚拟地址空间中的数据的处理器级移动,然后异步存储器移动器完成物理移动 实际地址空间(内存)中的数据。 AMM ST指令的状态/控制字段在完成AMM操作时包括对低级缓存的请求处理的指示。 当状态/控制字段指示应该执行至少一个缓存的更新时,异步存储器移动器自动将数据的副本从数据移动转发到较低级的高速缓存,并触发高速缓存的一致性状态的更新 其中放置数据副本的行。

    Specifying an addressing relationship in an operand data structure
    58.
    发明授权
    Specifying an addressing relationship in an operand data structure 有权
    在操作数数据结构中指定寻址关系

    公开(公告)号:US08281106B2

    公开(公告)日:2012-10-02

    申请号:US12336342

    申请日:2008-12-16

    IPC分类号: G06F12/00

    CPC分类号: G06F9/345

    摘要: A processor includes at least one execution unit that executes instructions, at least one register file, coupled to the at least one execution unit, that buffers operands for access by the at least one execution unit, and an instruction sequencing unit that fetches instructions for execution by the execution unit. The processor further includes an operand data structure and an address generation accelerator. The operand data structure specifies a first relationship between addresses of sequential accesses within a first address region and a second relationship between addresses of sequential accesses within a second address region. The address generation accelerator computes a first address of a first memory access in the first address region by reference to the first relationship and a second address of a second memory access in the second address region by reference to the second relationship.

    摘要翻译: 处理器包括执行指令的至少一个执行单元,耦合到所述至少一个执行单元的至少一个寄存器文件,其缓冲由所述至少一个执行单元访问的操作数,以及指令排序单元,其提取用于执行的指令 由执行单位。 处理器还包括操作数数据结构和地址生成加速器。 操作数数据结构指定第一地址区域内的顺序访问的地址与第二地址区域内的顺序存取的地址之间的第一关系。 参考第二关系,地址生成加速器通过参考第一关系和第二地址区中的第二存储器访问的第二地址来计算第一地址区中的第一存储器访问的第一地址。

    Sourcing differing amounts of prefetch data in response to data prefetch requests
    59.
    发明授权
    Sourcing differing amounts of prefetch data in response to data prefetch requests 失效
    根据数据预取请求采购不同数量的预取数据

    公开(公告)号:US08250307B2

    公开(公告)日:2012-08-21

    申请号:US12024165

    申请日:2008-02-01

    IPC分类号: G06F12/00

    摘要: According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core.

    摘要翻译: 根据数据处理的方法,存储器控制器从数据处理系统的处理器核心接收预取负载请求。 预取加载请求指定所请求的数据行。 响应于接收到预取加载请求,存储器控制器通过参考需求请求流来确定响应于预取加载请求将多少数据提供给处理器核。 响应于存储器控制器确定提供少于全部所请求的数据行,存储器控制器将少于所有请求的数据行提供给处理器核。

    Data prefetching using indirect addressing
    60.
    发明授权
    Data prefetching using indirect addressing 有权
    使用间接寻址进行数据预取

    公开(公告)号:US08166277B2

    公开(公告)日:2012-04-24

    申请号:US12024186

    申请日:2008-02-01

    IPC分类号: G06F13/00

    CPC分类号: G06F12/0862 G06F2212/6028

    摘要: A technique for performing indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content of a memory at the first memory address is then fetched. A second memory address is determined from the content of the memory at the first memory address. Finally, a data block (e.g., a cache line) including data at the second memory address is fetched (e.g., from the memory or another memory).

    摘要翻译: 用于执行间接数据预取的技术包括确定与数据预取指令相关联的指针的第一存储器地址。 然后获取第一个存储器地址上的存储器的内容。 从第一存储器地址处的存储器的内容确定第二存储器地址。 最后,取出包括第二存储器地址上的数据的数据块(例如,高速缓存行)(例如,从存储器或另一个存储器)。