-
公开(公告)号:US11868204B1
公开(公告)日:2024-01-09
申请号:US17548270
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Ofer Naaman , Osnat Katz , Nir Bar-Or , Adi Habusha
CPC classification number: G06F11/079 , G06F11/073 , G06F11/0751
Abstract: A system includes an obsolete cache-line vector having a plurality of memory elements, wherein each memory element has a one-to-one correspondence to a cache line entry of a cache memory. The vector can capture cache line errors that occur at different times from an error detection logic associated with the cache memory. A counter can be coupled to the obsolete cache-line vector for tracking how many of the memory elements in the vector are activated. When a predetermined threshold is reached, a threshold comparator can release a trigger for further analysis. An error events logger can be used to track all of the errors that occurred. The error events logger can also use a time stamp, which can assist the RAS system in analyzing a correlation between the errors, such as patterns that occur and time differences between the errors.
-
公开(公告)号:US11720444B1
公开(公告)日:2023-08-08
申请号:US17548190
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Ofer Naaman , Osnat Katz , Nir Bar-Or , Adi Habusha
IPC: G06F11/07 , G06F11/10 , G06F12/02 , G06F12/0891
CPC classification number: G06F11/1068 , G06F11/076 , G06F11/0772 , G06F12/0238 , G06F12/0891
Abstract: A system captures errors and stores an obsolete line bit qualifier per cache entry that can be used to dynamically mark a specific cache entry as obsolete. For example, the cache entry can be marked as obsolete after detecting repetitive single-bit errors on a same cache entry within a predetermined period of time. For cache lines marked as obsolete, a cache controller can ensure that the cache line entry remains unused. The detection of a repetitive single-bit error can be accomplished by implementing a counter per cache entry and a timer. The counter counts errors within a timer window, and a repetitive error is reported if the counter reaches a threshold level. By catching repetitive single-bit errors before such errors spread to multi-bit errors, the system can increase the life span of the server computer.
-