HOST-LEVEL ERROR DETECTION AND FAULT CORRECTION

    公开(公告)号:US20230409426A1

    公开(公告)日:2023-12-21

    申请号:US17841864

    申请日:2022-06-16

    CPC classification number: G06F11/1004 G06F11/102 G06F11/1068 G06F11/0772

    Abstract: A processing system includes a processing device coupled to a memory configured to check for and correct faults in requested data. In response to correcting the faults of the requested data, the memory sends the corrected data and unused check bits to the processing device as a plurality of fetch returns. The memory also sends a parity fetch based on the corrected data and one or more operations to the processing device. After receiving the plurality of fetch returns and the unused check bits, the processing device checks each fetch return for faults based on the unused check bits. In response to determining that a fetch return includes a fault, the processing device erases the fetch return and reconstructs the fetch return based on one or more other received fetch returns and the parity fetch.

    WATERFALL COUNTERS AND AN APPLICATION TO ARCHITECTURAL VULNERABILITY FACTOR ESTIMATION

    公开(公告)号:US20180181492A1

    公开(公告)日:2018-06-28

    申请号:US15389573

    申请日:2016-12-23

    Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.

    Using redundant transactions to verify the correctness of program code execution
    34.
    发明授权
    Using redundant transactions to verify the correctness of program code execution 有权
    使用冗余事务来验证程序代码执行的正确性

    公开(公告)号:US09448933B2

    公开(公告)日:2016-09-20

    申请号:US14013252

    申请日:2013-08-29

    CPC classification number: G06F12/0811

    Abstract: In the described embodiments, a processor core (e.g., a GPU core) receives a section of program code to be executed in a transaction from another entity in a computing device. The processor core sends the section of program code to one or more compute units in the processor core to be executed in a first transaction and concurrently executed in a second transaction, thereby creating a “redundant transaction pair.” When the first transaction and the second transaction are completed, the processor core compares a read-set of the first transaction to a read-set of the second transaction and compares a write-set of the first transaction to a write-set of the second transaction. When the read-sets and the write-sets match and no transactional error condition has occurred, the processor core allows results from the first transaction to be committed to an architectural state of the computing device.

    Abstract translation: 在所描述的实施例中,处理器核心(例如,GPU核心)从计算设备中的另一个实体接收要在事务中执行的程序代码部分。 处理器核心将程序代码段发送到处理器核心中的一个或多个计算单元,以在第一事务中执行并在第二事务中同时执行,从而创建“冗余事务对”。当第一事务和第二事务 处理器核心将第一事务的读取集合与第二事务的读取集进行比较,并将第一事务的写入集合与第二事务的写入集进行比较。 当读取集合和写入集合匹配并且没有发生事务错误条件时,处理器核心允许来自第一事务的结果被提交到计算设备的架构状态。

    Spare memory external to protected memory
    35.
    发明授权
    Spare memory external to protected memory 有权
    外部存储器内存的备用内存

    公开(公告)号:US09406403B2

    公开(公告)日:2016-08-02

    申请号:US13926155

    申请日:2013-06-25

    CPC classification number: G11C29/76 G11C11/401

    Abstract: A memory subsystem employs spare memory cells external to one or more memory devices. In some embodiments, a processing system uses the spare memory cells to replace individual selected cells at the protected memory, whereby the selected cells are replaced on a cell-by-cell basis, rather than exclusively on a row-by-row, column-by-column, or block-by-block basis. This allows faulty memory cells to be replaced efficiently, thereby improving memory reliability and manufacturing yields, without requiring large blocks of spare memory cells.

    Abstract translation: 存储器子系统在一个或多个存储器件外部使用备用存储器单元。 在一些实施例中,处理系统使用备用存储器单元来替换受保护存储器处的各个所选择的单元,由此所选择的单元在逐个单元的基础上被替代,而不是仅排列在逐列的列上, 逐列或逐块的基础。 这样可以有效地更换故障存储单元,从而提高存储器的可靠性和制造成本,而不需要大量的备用存储单元。

    Hardware based redundant multi-threading inside a GPU for improved reliability
    36.
    发明授权
    Hardware based redundant multi-threading inside a GPU for improved reliability 有权
    基于硬件的冗余多线程内部GPU,以提高可靠性

    公开(公告)号:US09026847B2

    公开(公告)日:2015-05-05

    申请号:US13724968

    申请日:2012-12-21

    CPC classification number: G06F11/0778 G06F11/1482

    Abstract: A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.

    Abstract translation: 提供了一种使用计算机硬件验证计算输出的系统和方法。 在基于硬件的处理器上生成和处理计算实例。 当计算的实例被处理时,每个计算实例都接收到其他计算实例可访问的负载。 通过处理计算实例生成输出实例。 输出的实例在基于硬件的处理器中相互验证,以确保输出的准确性。

    Software Only Inter-Compute Unit Redundant Multithreading for GPUs
    37.
    发明申请
    Software Only Inter-Compute Unit Redundant Multithreading for GPUs 有权
    仅用于软件的计算单元冗余多线程的GPU

    公开(公告)号:US20140373028A1

    公开(公告)日:2014-12-18

    申请号:US13920524

    申请日:2013-06-18

    Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.

    Abstract translation: 一种用于执行第一和第二工作组的系统,方法和计算机程序产品,并且经由同步机制将第一工作组的签名变量与第二工作组的签名变量进行比较。 第一个和第二个工作组通过软件映射到一个标识符。 此映射确保第一个和第二个工作组对完全相同的代码执行完全相同的数据,而不会更改底层硬件。 通过独立地执行第一和第二工作组,可以验证第一和第二工作组的基础计算。 此外,由于第一和第二工作组的执行结果仅在指定的比较点进行比较,系统性能基本上不受影响。

    N-WAY FAULT TOLERANT PROCESSING SYSTEM
    38.
    发明公开

    公开(公告)号:US20240320050A1

    公开(公告)日:2024-09-26

    申请号:US18126139

    申请日:2023-03-24

    CPC classification number: G06F9/5044 G06F9/3861 G06F9/5077

    Abstract: A processor includes two or more core dies each including one or more processor cores. A first core die of the processor is associated with a first operating system and the processor cores of the first core die execute a set of instructions according to the first operating system to produce a first result. A second core of the processor is associated with a second operating system and the processor cores of the second core of the second core die execute the set of instructions according to the second operating system to produce a second result. The first and second core dies provide the first and second results to a voting circuitry that generates an output based on the first and second results.

Patent Agency Ranking