Page retirement techniques for multi-page DRAM faults

    公开(公告)号:US12216539B2

    公开(公告)日:2025-02-04

    申请号:US17977001

    申请日:2022-10-31

    Abstract: A processing system employs techniques for enhancing dynamic random access memory (DRAM) page retirement to facilitate identification and retirement of pages affected by multi-page DRAM faults. In response to detecting an uncorrectable error at a first page of DRAM, the processing system identifies a second page of the DRAM for potential retirement based on one or more of physical proximity to the first page, inclusion in a range of addresses stored at a fault map that tracks addresses of DRAM pages having detected faults, and predicting a set of pages to check for faults based on misses at a translation lookaside buffer (TLB).

    Devices, systems, and methods for injecting fabricated errors into machine check architectures

    公开(公告)号:US12135625B2

    公开(公告)日:2024-11-05

    申请号:US18089135

    申请日:2022-12-27

    Abstract: An exemplary system includes and/or represents an agent and a machine check architecture. In one example, the machine check architecture includes and/or represents at least one circuit configured to report errors via at least one reporting register. In this example, the machine check architecture also includes and/or represents at least one error-injection register configured to cause the circuit to inject at least one fabricated error report into the reporting register in response to a write operation performed by the agent on at least one bit of the error-injection register. Various other devices, systems, and methods are also disclosed.

    Method for a reliability, availability, and serviceability-conscious huge page support

    公开(公告)号:US11237928B2

    公开(公告)日:2022-02-01

    申请号:US16700993

    申请日:2019-12-02

    Abstract: A method includes reserving memory capacity in a first memory device as patch memory region for backing faulted memory, receiving a memory error indication indicating an uncorrectable error in a faulted segment in a second memory device and, in response to the memory error indication, associating in a remapping table the faulted segment with a patch segment in the patch memory region. The faulted segment is smaller than a memory page size of the second memory device. The method also includes, in response to receiving a memory access request directed to the faulted memory segment, servicing the memory access request from the patch segment by querying the remapping table to determine a patch segment address corresponding to a requested memory address, where the patch segment address identifies the location of the patch segment, and based on the patch segment address, performing the requested memory access at the patch segment.

    Waterfall counters and an application to architectural vulnerability factor estimation

    公开(公告)号:US10331537B2

    公开(公告)日:2019-06-25

    申请号:US15389573

    申请日:2016-12-23

    Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.

    Method and apparatus for providing distributed checkpointing

    公开(公告)号:US10073746B2

    公开(公告)日:2018-09-11

    申请号:US15207943

    申请日:2016-07-12

    Abstract: Methods and apparatus presented herein provide distributed checkpointing in a multi-node system, such as a network of servers in a data center. When checkpointing of application state data is needed in a node, the methods and apparatus determine whether checkpoint memory space is available in the node for checkpointing the application state data. If not enough checkpoint memory space is available in the node, the methods and apparatus request and find additional checkpoint memory space from other nodes in the system. In this manner, the methods and apparatus can checkpoint the application state data into available checkpoint memory spaces distributed among a plurality of nodes. This allows for high bandwidth and low latency checkpointing operations in the multi-node system.

    SPARE MEMORY EXTERNAL TO PROTECTED MEMORY
    8.
    发明申请
    SPARE MEMORY EXTERNAL TO PROTECTED MEMORY 有权
    备用存储器外部保护存储器

    公开(公告)号:US20140376320A1

    公开(公告)日:2014-12-25

    申请号:US13926155

    申请日:2013-06-25

    CPC classification number: G11C29/76 G11C11/401

    Abstract: A memory subsystem employs spare memory cells external to one or more memory devices. In some embodiments, a processing system uses the spare memory cells to replace individual selected cells at the protected memory, whereby the selected cells are replaced on a cell-by-cell basis, rather than exclusively on a row-by-row, column-by-column, or block-by-block basis. This allows faulty memory cells to be replaced efficiently, thereby improving memory reliability and manufacturing yields, without requiring large blocks of spare memory cells.

    Abstract translation: 存储器子系统在一个或多个存储器件外部使用备用存储器单元。 在一些实施例中,处理系统使用备用存储器单元来替换受保护存储器处的各个所选择的单元,由此所选择的单元在逐个单元的基础上被替代,而不是仅排列在逐列的列上, 逐列或逐块的基础。 这样可以有效地更换故障存储单元,从而提高存储器的可靠性和制造成本,而不需要大量的备用存储单元。

    Error reporting for non-volatile memory modules

    公开(公告)号:US11797369B2

    公开(公告)日:2023-10-24

    申请号:US17864804

    申请日:2022-07-14

    CPC classification number: G06F11/0772 G06F3/0679 G06F11/073 G06F11/141

    Abstract: A memory controller includes a memory channel controller adapted to receive memory access requests and dispatch associated commands addressable in a system memory address space to a non-volatile storage class memory (SCM) module. The non-volatile error reporting circuit identifies error conditions associated with the non-volatile SCM module and maps the error conditions from a first number of possible error conditions associated with the non-volatile SCM module to a second, smaller number of virtual error types for reporting to an error monitoring module of a host operating system, the mapping based at least on a classification that the error condition will or will not have a deleterious effect on an executable process running on the host operating system.

Patent Agency Ranking