摘要:
Channel marking is provided in a memory system that includes a first memory channel, a second memory channel, and error correction code (ECC) logic. The memory system is configured to perform a method that includes receiving a request to apply a first channel mark to the first memory channel and determining a priority level of the first channel mark. A request is received to apply a second channel mark to the second memory channel, and a priority level of the second mark is determined. It is determined that the priority level of the first channel mark is higher than the priority level of the second channel mark. The first channel mark is supplied to the ECC logic while blocking the second channel mark from the ECC logic.
摘要:
Marking memory chips as faulty when a fault is detected in data from the memory chip. Upon detecting that a plurality of memory chips are faulty, determining which of a plurality of memory channels contains the faulty memory chips. Marking one of a plurality of memory channels as failing in response to determining that the number of failing memory chips has exceeded a threshold.
摘要:
Marking memory chips as faulty when a fault is detected in data from the memory chip. Upon detecting that a plurality of memory chips are faulty, determining which of a plurality of memory channels contains the faulty memory chips. Marking one of a plurality of memory channels as failing in response to determining that the number of failing memory chips has exceeded a threshold.
摘要:
Channel marking is provided in a memory system that includes a first memory channel, a second memory channel, and error correction code (ECC) logic. The memory system is configured to perform a method that includes receiving a request to apply a first channel mark to the first memory channel and determining a priority level of the first channel mark. A request is received to apply a second channel mark to the second memory channel, and a priority level of the second mark is determined. It is determined that the priority level of the first channel mark is higher than the priority level of the second channel mark. The first channel mark is supplied to the ECC logic while blocking the second channel mark from the ECC logic.
摘要:
Channel marking is provided in a memory system that includes a memory channel with a plurality of memory devices. The memory devices are arranged into a first group of memory devices and a second group of memory devices. The memory system is configured to perform a method that includes determining that more than a threshold number of memory devices in the first group are failing. An error correction code (ECC) is configured to compensate for errors associated with memory devices in the first group on the memory channel and to perform error correction on errors associated with memory devices in the second group on the memory channel.
摘要:
Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.
摘要:
Providing heterogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for performing a recovery operation on the failing memory channel while other memory channels are performing normal system operations, for bringing the recovered channel back into operational mode with the other memory channels for store operations, for continuing to mark the recovered channel to guard against stale data, for removing any stale data after the recovery operation is complete, and for removing the mark on the recovered channel to allow the normal system operations with all of the memory channels, the removing in response to the removing any stale data being complete.
摘要:
Correcting memory device (chip) and memory channel failures in the presence of known memory device failures. A memory channel failure is located and corrected, or alternatively up to c chip failures are corrected and up to d chip failures are detected in the presence of up to u chips that are marked as suspect. A first stage of decoding is performed that results in recovering an estimate of correctable errors affecting the data or in declaring an uncorrectable error state. When an uncorrectable error state is declared, a second stage of decoding is performed to attempt to correct u erasures and a channel error in M iterations where the channel location is changed in each iteration. A correctable error is declared in response to exactly one of the M iterations being successful.
摘要:
Error correction and detection in a redundant memory system including a a computer implemented method that includes receiving data including error correction code (ECC) bits, the receiving from a plurality of channels, each channel comprising a plurality of memory devices at memory device locations. The method also includes computing syndromes of the data; receiving a channel identifier of one of the channels; and removing a contribution of data received on the channel from the computed syndromes, the removing resulting in channel adjusted syndromes. The channel adjusted syndromes are decoded resulting in channel adjusted memory device locations of failing memory devices, the channel adjusted memory device locations corresponding to memory device locations.
摘要:
A computer-implemented method for verifying a RAIM/ECC design using a hierarchical injection scheme that includes selecting marks for generating an error mask, selecting a fixed bit mask based on the selected marks, determining whether to inject errors into at least one of a marked channel and at least one marked chip of a channel; and randomly injecting errors into the at least one of the marked channel and the at least one marked chip when determined.