Handling permanent and transient errors using a SIMD unit
    1.
    发明申请
    Handling permanent and transient errors using a SIMD unit 审中-公开
    使用SIMD单元处理永久和瞬态错误

    公开(公告)号:US20060190700A1

    公开(公告)日:2006-08-24

    申请号:US11063122

    申请日:2005-02-22

    IPC分类号: G06F15/00

    CPC分类号: G06F11/1641

    摘要: A method for handling permanent and transient errors in a microprocessor is disclosed. The method includes reading a scalar value and a scalar operation from an execution unit of the microprocessor. The method further includes writing a copy of the scalar value into each of a plurality of elements of a vector register of a Single Instruction Multiple Data (SIMD) unit of the microprocessor and executing the scalar operation on each scalar value in each of the plurality of elements of the vector register of the SIMED unit using a vector operation. The method further includes comparing each result of the scalar operation on each scalar value in each of the plurality of elements of the vector register and detecting a permanent or transient error if all of the results are not identical.

    摘要翻译: 公开了一种用于处理微处理器中的永久和瞬态误差的方法。 该方法包括从微处理器的执行单元读取标量值和标量运算。 该方法还包括将标量值的副本写入微处理器的单指令多数据(SIMD)单元的向量寄存器的多个元素中的每一个元素中,并对多个数据单元中的每一个的每个标量值执行标量运算 使用向量操作的SIMED单元的向量寄存器的元素。 所述方法还包括:对所述向量寄存器的所述多个元素中的每一个中的每个标量值进行标量运算的每个结果,如果所有结果不相同,则检测永久或瞬态错误。

    Methods of cache preloading on a partition or a context switch
    2.
    发明授权
    Methods of cache preloading on a partition or a context switch 有权
    缓存预加载在分区或上下文切换上的方法

    公开(公告)号:US09092341B2

    公开(公告)日:2015-07-28

    申请号:US13545304

    申请日:2012-07-10

    摘要: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

    摘要翻译: 被称为“基于区域的高速缓存恢复预取器”(RECAP)的方案被用于在分区或上下文切换上进行高速缓存预加载。 RECAP利用空间局部性提供带宽有效的预取器,以减少由多编程虚拟化引起的“冷”缓存效应。 RECAP组将高速缓存块缓存到内存的粗粒度区域中,并预测哪些区域包含下一次执行当前虚拟机时应预取的有用块。 基于这些预测,并且使用也利用空间局部性的简单压缩技术,RECAP提供了一种强大的预取器,可以在没有过多带宽开销或减速的情况下提高性能。

    Write-through cache optimized for dependence-free parallel regions
    3.
    发明授权
    Write-through cache optimized for dependence-free parallel regions 有权
    针对无依赖并行区域优化的直写缓存

    公开(公告)号:US08627010B2

    公开(公告)日:2014-01-07

    申请号:US13604349

    申请日:2012-09-05

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0837

    摘要: An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.

    摘要翻译: 一种用于提高并行计算系统性能的设备和计算机程序产品。 与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器来检测第一高速缓存行的虚假共享的发生,并允许第一高速缓存行的错误共享由 第二处理器。 当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时,发生第一高速缓存行的错误共享 设备由第二硬件本地缓存控制器。

    Adaptive multi-bit error correction in endurance limited memories
    4.
    发明授权
    Adaptive multi-bit error correction in endurance limited memories 失效
    耐力有限的存储器中的自适应多位错误校正

    公开(公告)号:US08589762B2

    公开(公告)日:2013-11-19

    申请号:US13176092

    申请日:2011-07-05

    IPC分类号: G11C29/00

    摘要: Multi-bit stuck-at fault error recovery can be enabled by adaptive multi-bit error correction method, in which the overhead of error correction hardware is reduced without affecting the lifetime of the memory device. Error correction logic hardware is decoupled from memory blocks. An error correction logic block is partitioned such that error correction logic entries support different number of error correction capabilities based on the probability of occurrence of the different number of errors in different memory blocks. Faulty memory blocks are mapped to appropriate error correction logic entries. The mapping can be one-to-one or many-to-one depending on embodiments. The adaptive partitioning of the error correction logic entries can be configured to match projected statistical distribution of errors in logic blocks, and can reduce the total error correction logic overhead, provide sufficient error correction, and/or extend the lifetime of the memory device.

    摘要翻译: 可以通过自适应多位错误校正方法来实现多位卡滞故障恢复,其中降低了纠错硬件的开销,而不影响存储器件的使用寿命。 纠错逻辑硬件与存储器块分离。 错误校正逻辑块被分区,使得纠错逻辑条目基于在不同存储器块中出现不同数量的错误的概率来支持不同数量的纠错能力。 错误的存储器块被映射到适当的纠错逻辑条目。 取决于实施例,映射可以是一对一或多对一。 错误校正逻辑条目的自适应分割可以被配置为匹配逻辑块中的误差的预计统计分布,并且可以减少总误差校正逻辑开销,提供足够的纠错和/或延长存储器件的寿命。

    ADAPTIVE MULTI-BIT ERROR CORRECTION IN ENDURANCE LIMITED MEMORIES
    6.
    发明申请
    ADAPTIVE MULTI-BIT ERROR CORRECTION IN ENDURANCE LIMITED MEMORIES 失效
    自适应多重错误修正在有限的记忆

    公开(公告)号:US20130013977A1

    公开(公告)日:2013-01-10

    申请号:US13176092

    申请日:2011-07-05

    IPC分类号: H03M13/05 G06F11/10

    摘要: Multi-bit stuck-at fault error recovery can be enabled by adaptive multi-bit error correction method, in which the overhead of error correction hardware is reduced without affecting the lifetime of the memory device. Error correction logic hardware is decoupled from memory blocks. An error correction logic block is partitioned such that error correction logic entries support different number of error correction capabilities based on the probability of occurrence of the different number of errors in different memory blocks. Faulty memory blocks are mapped to appropriate error correction logic entries. The mapping can be one-to-one or many-to-one depending on embodiments. The adaptive partitioning of the error correction logic entries can be configured to match projected statistical distribution of errors in logic blocks, and can reduce the total error correction logic overhead, provide sufficient error correction, and/or extend the lifetime of the memory device.

    摘要翻译: 可以通过自适应多位错误校正方法来实现多位卡滞故障恢复,其中降低了纠错硬件的开销,而不影响存储器件的使用寿命。 纠错逻辑硬件与存储器块分离。 错误校正逻辑块被分区,使得纠错逻辑条目基于在不同存储器块中出现不同数量的错误的概率来支持不同数量的纠错能力。 错误的存储器块被映射到适当的纠错逻辑条目。 取决于实施例,映射可以是一对一或多对一。 错误校正逻辑条目的自适应分割可以被配置为匹配逻辑块中的误差的预测统计分布,并且可以减少总误差校正逻辑开销,提供足够的纠错和/或延长存储器件的寿命。

    PREDICTING OUT-OF-ORDER INSTRUCTION LEVEL PARALLELISM OF THREADS IN A MULTI-THREADED PROCESSOR
    7.
    发明申请
    PREDICTING OUT-OF-ORDER INSTRUCTION LEVEL PARALLELISM OF THREADS IN A MULTI-THREADED PROCESSOR 有权
    在多线程处理器中预测线程的超出指令级别并行列表

    公开(公告)号:US20130007423A1

    公开(公告)日:2013-01-03

    申请号:US13172218

    申请日:2011-06-29

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3836 G06F9/3851

    摘要: Systems and methods for predicting out-of-order instruction-level parallelism (ILP) of threads being executed in a multi-threaded processor and prioritizing scheduling thereof are described herein. One aspect provides for tracking completion of instructions using a global completion table having a head segment and a tail segment; storing prediction values for each instruction in a prediction table indexed via instruction identifiers associated with each instruction, a prediction value being configured to indicate an instruction is predicted to issue from one of: the head segment and the tail segment; and predicting threads with more instructions issuing from the tail segment have a higher degree of out-of-order instruction-level parallelism. Other embodiments and aspects are also described herein.

    摘要翻译: 这里描述了用于预测在多线程处理器中执行的线程的无序指令级并行性(ILP)和优先级调度的系统和方法。 一个方面提供使用具有头段和尾段的全局完成表跟踪指令的完成; 将每个指令的预测值存储在通过与每个指令相关联的指令标识符索引的预测表中,预测值被配置为指示从头段和尾段之一发出指令; 并且预测具有从尾段发出的更多指令的线程具有更高程度的无序指令级并行性。 本文还描述了其它实施例和方面。

    WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS
    8.
    发明申请
    WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS 有权
    写入 - 通过高速缓存优化为无依赖的并行区域

    公开(公告)号:US20120331232A1

    公开(公告)日:2012-12-27

    申请号:US13604349

    申请日:2012-09-05

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0837

    摘要: An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.

    摘要翻译: 一种用于提高并行计算系统性能的设备和计算机程序产品。 与第一处理器的第一本地高速缓冲存储器设备相关联的第一硬件本地高速缓存控制器通过运行程序代码的第二处理器来检测第一高速缓存行的虚假共享的发生,并允许第一高速缓存行的错误共享由 第二处理器。 当由第一硬件本地高速缓存控制器更新第一本地高速缓存存储器设备中的第一高速缓存行的第一部分并且随后在第二本地高速缓冲存储器中更新第一高速缓存行的第二部分时,发生第一高速缓存行的错误共享 设备由第二硬件本地缓存控制器。

    Limiting entries in load issued premature part of load reorder queue searched to detect invalid retrieved values to between store safe and snoop safe pointers for the congruence class
    10.
    发明授权
    Limiting entries in load issued premature part of load reorder queue searched to detect invalid retrieved values to between store safe and snoop safe pointers for the congruence class 有权
    限制条目在加载中发现过载部分重新排序队列搜索以检测无效检索值到存储安全和窥探安全指针之间的同余类

    公开(公告)号:US07971033B2

    公开(公告)日:2011-06-28

    申请号:US12172521

    申请日:2008-07-14

    IPC分类号: G06F9/312

    CPC分类号: G06F9/3834

    摘要: A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.

    摘要翻译: 一种用于减少在由处理器执行加载指令时搜索的加载重新排序队列(LRQ)中的加载指令的数量的方法,包括分派加载指令; 以程序顺序插入LRQ中的加载指令; 清除负载接收的数据字段; 执行加载指令; 检查装载重新排序队列(LRQ)条目; 重新执行匹配LRQ条目的加载指令; 继续执行 获取负载数据; 设置负载接收数据字段; 将每个加载指令的加载序列号(LSQN)与snoop_safe寄存器内容进行比较; 如果LSQN的幅度大于snoop_safe,则对所有加载接收数据位进行AND操作; 将snoop_safe寄存器设置为加载指令的LSQN; 搜索LRQ条目; 并将load_peril_snoop寄存器设置为找到snoop_safe较小的第一个加载指令的LRQ索引值。