Method and system for preventing livelock due to competing updates of prediction information
    51.
    发明申请
    Method and system for preventing livelock due to competing updates of prediction information 审中-公开
    由于预测信息的竞争更新而防止活动锁定的方法和系统

    公开(公告)号:US20070277025A1

    公开(公告)日:2007-11-29

    申请号:US11440554

    申请日:2006-05-25

    IPC分类号: G06F9/44

    摘要: A system to prevent livelock. An outcome of an event is predicted to form an event outcome prediction. The event outcome prediction is compared with a correct value for a datum to be accessed. An instruction is appended with a real event outcome when the outcome of the event is mispredicted to form an appended instruction. A prediction override bit is set on the appended instruction. Then, the appended instruction is executed with the real event outcome.

    摘要翻译: 防止活动锁的系统。 预测事件的结果将形成事件结果预测。 将事件结果预测与要访问的基准的正确值进行比较。 当事件的结果被错误预测以形成附加的指令时,指令将附加真实的事件结果。 在附加的指令上设置预测覆盖位。 然后,附加的指令与实际事件结果一起执行。

    Context look ahead storage structures
    52.
    发明申请
    Context look ahead storage structures 失效
    前瞻性存储结构

    公开(公告)号:US20050120193A1

    公开(公告)日:2005-06-02

    申请号:US10724815

    申请日:2003-12-01

    IPC分类号: G06F9/00 G06F9/38

    CPC分类号: G06F9/3806

    摘要: A memory storage structure includes a memory storage device, and a first meta-structure having a first size and operating at a first speed. The first speed is faster than a second speed for storing meta-information based on information stored in a memory. A second meta-structure is hierarchically associated with the first meta-structure. The second meta-structure has a second size larger than the first size and operates at the second speed such that faster and more accurate prefetching is provided by coaction of the first and second meta-structures. A method is provided to assemble the meta-information in the first meta-structure and copy this information to the second meta-structure, and prefetching the stored information from the second meta-structure to the first meta-structure ahead of its use.

    摘要翻译: 存储器存储结构包括存储器存储设备和具有第一大小并以第一速度操作的第一元结构。 基于存储在存储器中的信息,第一速度比用于存储元信息的第二速度快。 第二个元结构与第一个元结构分层关联。 第二元结构具有大于第一尺寸的第二尺寸并且以第二速度操作,使得通过第一和第二元结构的共同作用来提供更快更准确的预取。 提供了一种用于在第一元结构中组装元信息并将该信息复制到第二元结构的方法,并且将其从第二元结构预取存储到其使用之前的第一元结构。

    Relaxation of synchronization for iterative convergent computations
    53.
    发明授权
    Relaxation of synchronization for iterative convergent computations 有权
    放松迭代收敛计算的同步

    公开(公告)号:US09069545B2

    公开(公告)日:2015-06-30

    申请号:US13184718

    申请日:2011-07-18

    摘要: Systems and methods are disclosed that allow atomic updates to global data to be at least partially eliminated to reduce synchronization overhead in parallel computing. A compiler analyzes the data to be processed to selectively permit unsynchronized data transfer for at least one type of data. A programmer may provide a hint to expressly identify the type of data that are candidates for unsynchronized data transfer. In one embodiment, the synchronization overhead is reducible by generating an application program that selectively substitutes codes for unsynchronized data transfer for a subset of codes for synchronized data transfer. In another embodiment, the synchronization overhead is reducible by employing a combination of software and hardware by using relaxation data registers and decoders that collectively convert a subset of commands for synchronized data transfer into commands for unsynchronized data transfer.

    摘要翻译: 公开了允许至少部分地消除全局数据的原子更新以减少并行计算中的同步开销的系统和方法。 编译器分析要处理的数据,以选择性地允许至少一种类型的数据的不同步数据传输。 程序员可以提供明确识别作为不同步数据传输候选的数据类型的提示。 在一个实施例中,可以通过生成一个应用程序来减少同步开销,所述应用程序选择性地替代用于同步数据传输的代码子集的非同步数据传输的代码。 在另一个实施例中,可以通过使用松弛数据寄存器和解码器来将软件和硬件的组合应用于将用于同步数据传输的命令的子集合转换成用于非同步数据传输的命令来减少同步开销。

    METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH
    54.
    发明申请
    METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH 有权
    高速缓存在分段或上下文开关上的方法

    公开(公告)号:US20140019689A1

    公开(公告)日:2014-01-16

    申请号:US13545304

    申请日:2012-07-10

    IPC分类号: G06F12/12

    摘要: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

    摘要翻译: 被称为“基于区域的高速缓存恢复预取器”(RECAP)的方案被用于在分区或上下文切换上进行高速缓存预加载。 RECAP利用空间局部性提供带宽有效的预取器,以减少由多编程虚拟化引起的“冷”缓存效应。 RECAP组将高速缓存块缓存到内存的粗粒度区域中,并预测哪些区域包含下一次执行当前虚拟机时应预取的有用块。 基于这些预测,并且使用也利用空间局部性的简单压缩技术,RECAP提供了一种强大的预取器,可以在没有过多带宽开销或减速的情况下提高性能。

    Write-through cache optimized for dependence-free parallel regions
    55.
    发明授权
    Write-through cache optimized for dependence-free parallel regions 有权
    针对无依赖并行区域优化的直写缓存

    公开(公告)号:US08516197B2

    公开(公告)日:2013-08-20

    申请号:US13025706

    申请日:2011-02-11

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0837

    摘要: An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.

    摘要翻译: 一种用于提高并行计算系统性能的装置,方法和计算机程序产品。 与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器检测出第一高速缓存行的虚假共享的发生,并允许第一高速缓存行的错误共享由 第二处理器。 当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时,发生第一高速缓存行的错误共享 设备由第二硬件本地缓存控制器。

    MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY
    56.
    发明申请
    MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY 有权
    多线程处理器通过指令不确定的平衡

    公开(公告)号:US20130205118A1

    公开(公告)日:2013-08-08

    申请号:US13366999

    申请日:2012-02-06

    IPC分类号: G06F9/30 G06F9/38

    CPC分类号: G06F9/3844 G06F9/3851

    摘要: A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.

    摘要翻译: 用于指令执行的计算机系统包括具有流水线的处理器。 该系统被配置为执行一种方法,包括在流水线中取出多个指令,其中多个指令包括多个分支指令,对于多个分支指令中的每一个,向多个指令中的每一个指派分支不确定度 对于所述多个指令中的每一个指令,分配指示不确定性,所述指令不确定性是基于所述流水线中的指令不确定性的当前求和而作为旧的未解析分支的分支不确定性的总和并且平衡所述指令。

    RELAXATION OF SYNCHRONIZATION FOR ITERATIVE CONVERGENT COMPUTATIONS
    57.
    发明申请
    RELAXATION OF SYNCHRONIZATION FOR ITERATIVE CONVERGENT COMPUTATIONS 有权
    用于迭代融合计算的同步放松

    公开(公告)号:US20130024662A1

    公开(公告)日:2013-01-24

    申请号:US13184718

    申请日:2011-07-18

    IPC分类号: G06F9/30

    摘要: Systems and methods are disclosed that allow atomic updates to global data to be at least partially eliminated to reduce synchronization overhead in parallel computing. A compiler analyzes the data to be processed to selectively permit unsynchronized data transfer for at least one type of data. A programmer may provide a hint to expressly identify the type of data that are candidates for unsynchronized data transfer. In one embodiment, the synchronization overhead is reducible by generating an application program that selectively substitutes codes for unsynchronized data transfer for a subset of codes for synchronized data transfer. In another embodiment, the synchronization overhead is reducible by employing a combination of software and hardware by using relaxation data registers and decoders that collectively convert a subset of commands for synchronized data transfer into commands for unsynchronized data transfer.

    摘要翻译: 公开了允许至少部分地消除全局数据的原子更新以减少并行计算中的同步开销的系统和方法。 编译器分析要处理的数据,以选择性地允许至少一种类型的数据的不同步数据传输。 程序员可以提供明确识别作为不同步数据传输候选的数据类型的提示。 在一个实施例中,可以通过生成一个应用程序来减少同步开销,所述应用程序选择性地替代用于同步数据传输的代码子集的非同步数据传输的代码。 在另一个实施例中,可以通过使用松弛数据寄存器和解码器来将软件和硬件的组合应用于将用于同步数据传输的命令的子集合转换成用于非同步数据传输的命令来减少同步开销。

    PREDICTING CACHE MISSES USING DATA ACCESS BEHAVIOR AND INSTRUCTION ADDRESS
    58.
    发明申请
    PREDICTING CACHE MISSES USING DATA ACCESS BEHAVIOR AND INSTRUCTION ADDRESS 有权
    使用数据访问行为和指令地址预测高速缓存错误

    公开(公告)号:US20120284463A1

    公开(公告)日:2012-11-08

    申请号:US13099178

    申请日:2011-05-02

    IPC分类号: G06F12/08

    摘要: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses. Additional steps include scheduling at least one additional one of the plurality of instructions in accordance with the predicting; and updating the pattern history table and the global history buffer subsequent to actual execution of the particular instruction in an execution stage of the hardware processor pipeline, to reflect whether the predicting was accurate.

    摘要翻译: 在硬件处理器流水线的解码阶段,解码多个指令的一个特定指令。 确定特定指令需要存储器访问。 响应于这种确定,预测存储器访问是否将导致高速缓存未命中。 预测依次包括在解码级中存储为硬件表的模式历史表中访问多个条目中的一个条目。 访问至少部分地基于全球历史缓冲区中的至少最近的条目。 模式历史表存储多个预测。 全局历史缓冲区将先前存储器访问的实际结果存储为高速缓存命中和缓存未命中之一。 附加步骤包括根据预测调度多个指令中的至少一个附加的指令; 以及在硬件处理器管线的执行阶段中的特定指令的实际执行之后更新模式历史表和全局历史缓冲器,以反映预测是否准确。

    Method and system for preventing livelock due to competing updates of prediction information
    60.
    发明授权
    Method and system for preventing livelock due to competing updates of prediction information 有权
    由于预测信息的竞争更新而防止活动锁定的方法和系统

    公开(公告)号:US07979682B2

    公开(公告)日:2011-07-12

    申请号:US12051322

    申请日:2008-03-19

    IPC分类号: G06F9/30

    摘要: A system to prevent livelock. An outcome of an event is predicted to form an event outcome prediction. The event outcome prediction is compared with a correct value for a datum to be accessed. An instruction is appended with a real event outcome when the outcome of the event is mispredicted to form an appended instruction. A prediction override bit is set on the appended instruction. Then, the appended instruction is executed with the real event outcome.

    摘要翻译: 防止活动锁的系统。 预测事件的结果将形成事件结果预测。 将事件结果预测与要访问的基准的正确值进行比较。 当事件的结果被错误预测以形成附加的指令时,指令将附加真实的事件结果。 在附加的指令上设置预测覆盖位。 然后,附加的指令与实际事件结果一起执行。