摘要:
Dynamic graduated memory device protection in redundant array of independent memory (RAIM) systems that include a plurality of memory devices is provided. A first severity level of a first failing memory device in the plurality of memory devices is determined. The first failing memory device is associated with an identifier used to communicate a location of the first failing memory device to an error correction code (ECC). A second severity level of a second failing memory device in the plurality of memory devices is determined. It is determined that the second severity level is higher than the first severity level. The identifier from the first failing memory device is removed based on determining that the second severity level is higher than the first severity level. The identifier is applied to the second failing memory device based on determining that the second severity level is higher than the first severity level.
摘要:
Dynamic graduated memory device protection in redundant array of independent memory (RAIM) systems that include a plurality of memory devices is provided. A first severity level of a first failing memory device in the plurality of memory devices is determined. The first failing memory device is associated with an identifier used to communicate a location of the first failing memory device to an error correction code (ECC). A second severity level of a second failing memory device in the plurality of memory devices is determined. It is determined that the second severity level is higher than the first severity level. The identifier from the first failing memory device is removed based on determining that the second severity level is higher than the first severity level. The identifier is applied to the second failing memory device based on determining that the second severity level is higher than the first severity level.
摘要:
Marking memory chips as faulty when a fault is detected in data from the memory chip. Upon detecting that a plurality of memory chips are faulty, determining which of a plurality of memory channels contains the faulty memory chips. Marking one of a plurality of memory channels as failing in response to determining that the number of failing memory chips has exceeded a threshold.
摘要:
Channel marking is provided in a memory system that includes a first memory channel, a second memory channel, and error correction code (ECC) logic. The memory system is configured to perform a method that includes receiving a request to apply a first channel mark to the first memory channel and determining a priority level of the first channel mark. A request is received to apply a second channel mark to the second memory channel, and a priority level of the second mark is determined. It is determined that the priority level of the first channel mark is higher than the priority level of the second channel mark. The first channel mark is supplied to the ECC logic while blocking the second channel mark from the ECC logic.
摘要:
Marking memory chips as faulty when a fault is detected in data from the memory chip. Upon detecting that a plurality of memory chips are faulty, determining which of a plurality of memory channels contains the faulty memory chips. Marking one of a plurality of memory channels as failing in response to determining that the number of failing memory chips has exceeded a threshold.
摘要:
Channel marking is provided in a memory system that includes a first memory channel, a second memory channel, and error correction code (ECC) logic. The memory system is configured to perform a method that includes receiving a request to apply a first channel mark to the first memory channel and determining a priority level of the first channel mark. A request is received to apply a second channel mark to the second memory channel, and a priority level of the second mark is determined. It is determined that the priority level of the first channel mark is higher than the priority level of the second channel mark. The first channel mark is supplied to the ECC logic while blocking the second channel mark from the ECC logic.
摘要:
Channel marking is provided in a memory system that includes a memory channel with a plurality of memory devices. The memory devices are arranged into a first group of memory devices and a second group of memory devices. The memory system is configured to perform a method that includes determining that more than a threshold number of memory devices in the first group are failing. An error correction code (ECC) is configured to compensate for errors associated with memory devices in the first group on the memory channel and to perform error correction on errors associated with memory devices in the second group on the memory channel.
摘要:
Correcting memory device (chip) and memory channel failures in the presence of known memory device failures. A memory channel failure is located and corrected, or alternatively up to c chip failures are corrected and up to d chip failures are detected in the presence of up to u chips that are marked as suspect. A first stage of decoding is performed that results in recovering an estimate of correctable errors affecting the data or in declaring an uncorrectable error state. When an uncorrectable error state is declared, a second stage of decoding is performed to attempt to correct u erasures and a channel error in M iterations where the channel location is changed in each iteration. A correctable error is declared in response to exactly one of the M iterations being successful.
摘要:
Error correction and detection in a redundant memory system including a a computer implemented method that includes receiving data including error correction code (ECC) bits, the receiving from a plurality of channels, each channel comprising a plurality of memory devices at memory device locations. The method also includes computing syndromes of the data; receiving a channel identifier of one of the channels; and removing a contribution of data received on the channel from the computed syndromes, the removing resulting in channel adjusted syndromes. The channel adjusted syndromes are decoded resulting in channel adjusted memory device locations of failing memory devices, the channel adjusted memory device locations corresponding to memory device locations.
摘要:
Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.