-
公开(公告)号:US12216539B2
公开(公告)日:2025-02-04
申请号:US17977001
申请日:2022-10-31
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sudhanva Gurumurthi , Vilas Sridharan , Majed Valad Beigi
IPC: G06F11/30 , G06F11/07 , G06F11/10 , G06F12/1027
Abstract: A processing system employs techniques for enhancing dynamic random access memory (DRAM) page retirement to facilitate identification and retirement of pages affected by multi-page DRAM faults. In response to detecting an uncorrectable error at a first page of DRAM, the processing system identifies a second page of the DRAM for potential retirement based on one or more of physical proximity to the first page, inclusion in a range of addresses stored at a fault map that tracks addresses of DRAM pages having detected faults, and predicting a set of pages to check for faults based on misses at a translation lookaside buffer (TLB).
-
2.
公开(公告)号:US12135625B2
公开(公告)日:2024-11-05
申请号:US18089135
申请日:2022-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Vilas Sridharan , Hanbing Liu , Francisco L. Duran
Abstract: An exemplary system includes and/or represents an agent and a machine check architecture. In one example, the machine check architecture includes and/or represents at least one circuit configured to report errors via at least one reporting register. In this example, the machine check architecture also includes and/or represents at least one error-injection register configured to cause the circuit to inject at least one fabricated error report into the reporting register in response to a write operation performed by the agent on at least one bit of the error-injection register. Various other devices, systems, and methods are also disclosed.
-
公开(公告)号:US11874739B2
公开(公告)日:2024-01-16
申请号:US17033398
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Sudhanva Gurumurthi , Vilas Sridharan , Shaizeen Aga , Nuwan Jayasena , Michael Ignatowski , Shrikanth Ganapathy , John Kalamatianos
CPC classification number: G06F11/1076 , G06F21/602 , H04L9/32
Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
-
公开(公告)号:US11237928B2
公开(公告)日:2022-02-01
申请号:US16700993
申请日:2019-12-02
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Michael Ignatowski , Vilas Sridharan
Abstract: A method includes reserving memory capacity in a first memory device as patch memory region for backing faulted memory, receiving a memory error indication indicating an uncorrectable error in a faulted segment in a second memory device and, in response to the memory error indication, associating in a remapping table the faulted segment with a patch segment in the patch memory region. The faulted segment is smaller than a memory page size of the second memory device. The method also includes, in response to receiving a memory access request directed to the faulted memory segment, servicing the memory access request from the patch segment by querying the remapping table to determine a patch segment address corresponding to a requested memory address, where the patch segment address identifies the location of the patch segment, and based on the patch segment address, performing the requested memory access at the patch segment.
-
公开(公告)号:US11221902B2
公开(公告)日:2022-01-11
申请号:US16715302
申请日:2019-12-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sudhanva Gurumurthi , Vilas Sridharan
Abstract: Error handling for resilient software includes: receiving data indicating a region of resilient memory; detecting an error associated with a region of memory; and preventing raising an exception for the error in response to the region of memory falling within the region of resilient memory by preventing the region of memory as being identified as including the error.
-
公开(公告)号:US10331537B2
公开(公告)日:2019-06-25
申请号:US15389573
申请日:2016-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Manish Gupta , Vilas Sridharan , David A. Roberts
IPC: G06F11/00 , G06F11/34 , G06F12/0891
Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.
-
公开(公告)号:US10073746B2
公开(公告)日:2018-09-11
申请号:US15207943
申请日:2016-07-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Taniya Siddiqua , Vilas Sridharan
CPC classification number: G06F3/0604 , G06F3/0631 , G06F3/067 , G06F11/2058 , G06F11/2069 , G06F2201/84
Abstract: Methods and apparatus presented herein provide distributed checkpointing in a multi-node system, such as a network of servers in a data center. When checkpointing of application state data is needed in a node, the methods and apparatus determine whether checkpoint memory space is available in the node for checkpointing the application state data. If not enough checkpoint memory space is available in the node, the methods and apparatus request and find additional checkpoint memory space from other nodes in the system. In this manner, the methods and apparatus can checkpoint the application state data into available checkpoint memory spaces distributed among a plurality of nodes. This allows for high bandwidth and low latency checkpointing operations in the multi-node system.
-
公开(公告)号:US20140376320A1
公开(公告)日:2014-12-25
申请号:US13926155
申请日:2013-06-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh , Vilas Sridharan , James M. O'Connor
IPC: G11C29/04
CPC classification number: G11C29/76 , G11C11/401
Abstract: A memory subsystem employs spare memory cells external to one or more memory devices. In some embodiments, a processing system uses the spare memory cells to replace individual selected cells at the protected memory, whereby the selected cells are replaced on a cell-by-cell basis, rather than exclusively on a row-by-row, column-by-column, or block-by-block basis. This allows faulty memory cells to be replaced efficiently, thereby improving memory reliability and manufacturing yields, without requiring large blocks of spare memory cells.
Abstract translation: 存储器子系统在一个或多个存储器件外部使用备用存储器单元。 在一些实施例中,处理系统使用备用存储器单元来替换受保护存储器处的各个所选择的单元,由此所选择的单元在逐个单元的基础上被替代,而不是仅排列在逐列的列上, 逐列或逐块的基础。 这样可以有效地更换故障存储单元,从而提高存储器的可靠性和制造成本,而不需要大量的备用存储单元。
-
9.
公开(公告)号:US20240111622A1
公开(公告)日:2024-04-04
申请号:US17957948
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Siddharth K. Shah , Vilas Sridharan , Amitabh Mehra , Anil Harwani , William Fischofer
IPC: G06F11/07
CPC classification number: G06F11/0793 , G06F11/0721 , G06F11/079
Abstract: A disclosed method can include (i) reporting, by a microcontroller, detection of a violation of a physical infrastructure constraint to a machine check architecture, (ii) triggering, by the machine check architecture in response to the reporting, a machine-check exception such that the violation of the physical infrastructure constraint is recorded, and (iii) performing a corrective action based on the triggering of the machine-check exception. Various other apparatuses, systems, and methods are also disclosed.
-
公开(公告)号:US11797369B2
公开(公告)日:2023-10-24
申请号:US17864804
申请日:2022-07-14
Applicant: Advanced Micro Devices, Inc.
Inventor: James R. Magro , Kedarnath Balakrishnan , Vilas Sridharan
CPC classification number: G06F11/0772 , G06F3/0679 , G06F11/073 , G06F11/141
Abstract: A memory controller includes a memory channel controller adapted to receive memory access requests and dispatch associated commands addressable in a system memory address space to a non-volatile storage class memory (SCM) module. The non-volatile error reporting circuit identifies error conditions associated with the non-volatile SCM module and maps the error conditions from a first number of possible error conditions associated with the non-volatile SCM module to a second, smaller number of virtual error types for reporting to an error monitoring module of a host operating system, the mapping based at least on a classification that the error condition will or will not have a deleterious effect on an executable process running on the host operating system.
-
-
-
-
-
-
-
-
-