摘要:
Correcting memory device (chip) and memory channel failures in the presence of known memory device failures. A memory channel failure is located and corrected, or alternatively up to c chip failures are corrected and up to d chip failures are detected in the presence of up to u chips that are marked as suspect. A first stage of decoding is performed that results in recovering an estimate of correctable errors affecting the data or in declaring an uncorrectable error state. When an uncorrectable error state is declared, a second stage of decoding is performed to attempt to correct u erasures and a channel error in M iterations where the channel location is changed in each iteration. A correctable error is declared in response to exactly one of the M iterations being successful.
摘要:
Error correction and detection in a redundant memory system including a a computer implemented method that includes receiving data including error correction code (ECC) bits, the receiving from a plurality of channels, each channel comprising a plurality of memory devices at memory device locations. The method also includes computing syndromes of the data; receiving a channel identifier of one of the channels; and removing a contribution of data received on the channel from the computed syndromes, the removing resulting in channel adjusted syndromes. The channel adjusted syndromes are decoded resulting in channel adjusted memory device locations of failing memory devices, the channel adjusted memory device locations corresponding to memory device locations.
摘要:
Correcting memory device (chip) and memory channel failures in the presence of known memory device failures. A memory channel failure is located and corrected, or alternatively up to c chip failures are corrected and up to d chip failures are detected in the presence of up to u chips that are marked as suspect. A first stage of decoding is performed that results in recovering an estimate of correctable errors affecting the data or in declaring an uncorrectable error state. When an uncorrectable error state is declared, a second stage of decoding is performed to attempt to correct u erasures and a channel error in M iterations where the channel location is changed in each iteration. A correctable error is declared in response to exactly one of the M iterations being successful.
摘要:
Error correction and detection in a redundant memory system including a a computer implemented method that includes receiving data including error correction code (ECC) bits, the receiving from a plurality of channels, each channel comprising a plurality of memory devices at memory device locations. The method also includes computing syndromes of the data; receiving a channel identifier of one of the channels; and removing a contribution of data received on the channel from the computed syndromes, the removing resulting in channel adjusted syndromes. The channel adjusted syndromes are decoded resulting in channel adjusted memory device locations of failing memory devices, the channel adjusted memory device locations corresponding to memory device locations.
摘要:
A system to improve error code decoding using historical information. An example system includes storage partitioned into memory ranks, and a table to record symbols having failures for each memory rank. The system generates a memory rank score for each memory rank. The system also includes an error control decoder that uses the memory rank score when each memory rank is accessed in order to determine whether an error should be corrected or not.
摘要:
A system to improve error correction may include a fast decoder to process data packets until the fast decoder finds an uncorrectable error in a data packet at which point a request for at least two data packets is generated. The system may also include a slow decoder to possibly correct the uncorrectable error in a data packet based upon the at least two data packets.
摘要:
A system to improve error control coding may include memory chips of at least two different kinds. The system may also include error control encoder circuitry to substantially encode data for storage in any memory rank. The system may further include error control decoder circuitry to substantially decode encoded data received from any memory rank. The error decoder circuitry is comprised of a slow decoder and a fast decoder.
摘要:
A system to improve memory reliability in computer systems that may include memory chips, and may rely on a error control encoder to send codeword symbols for storage in each of the memory chips. At least two symbols from a codeword are assigned to each memory chip and therefore failure of any of the memory chips could affect two symbols or more. The system may also include a table to record failures and partial failures of the codeword symbols for each of the memory chips so the error control encoder can correct subsequent partial failures based upon the previous partial failures. The error control coder is capable of correcting and/or detecting more errors if only a fraction of a chip is noted in the table as having a failure as opposed to a full chip noted as having a failure.
摘要:
A system to improve error code decoding with retries may include a processing unit that requests data packets, and a queue to hold the data packets for the processing unit. The system may also include a decoder to determine a processing time for each data packet in the queue based upon any errors in each data packet, and if the processing time for a particular data packet is greater than a threshold, then to renew any requests for the data packets that are in the queue.
摘要:
A system to improve error control coding. An example system includes memory chips of at least two different kinds. The system also includes error control encoder circuitry to substantially encode data for storage in any memory rank. The system further includes error control decoder circuitry to substantially decode encoded data received from any memory rank. The error decoder circuitry is comprised of a slow decoder and a fast decoder.