Abstract:
Systems and methods are provided that confront the problem of failed storage integrated circuits (ICs) in a solid state drive (SSD) by using a fault-tolerant architecture along with one error correction code (ECC) mechanism for random/burst error corrections and an L-fold interleaving mechanism. The systems and methods described herein keep the SSD operational when one or more integrated circuits fail and allow the recovery of previously stored data from failed integrated circuits and allow random/burst errors to be corrected in other operational integrated circuits. These systems and methods replace the failed integrated circuits with fully functional/operational integrated circuits treated herein as spare integrated circuits. Furthermore, these systems and methods improve I/O performance in terms of maximum achievable read/write data rate.
Abstract:
A system for detecting and correcting errors in a data block includes a check bits generation unit which receives and encode data to be protected. The check bits generation unit partition the data into a plurality of logical groups. The check bits generation unit generates a parity bit for each of the logical group and additionally generates a pair of global error correction codes. The error correction unit is configured to generate a parity error bit for each of the logical group of data based on the received data and the original parity bits, as well as first and second syndrome codes.
Abstract:
The data contained in a storage system are for the most part protected using EDC-processes. The storage system according to the invention is structured to make multi-bit errors more recognizable using the EDC process.
Abstract:
The present disclosure includes methods and devices for stripe-based memory operation. One method embodiment includes writing data in a first stripe across a storage volume of a plurality of memory devices. A portion of the first stripe is updated by writing updated data in a portion of a second stripe across the storage volume of the plurality of memory devices. The portion of the first stripe is invalidated. The invalid portion of the first stripe and a remainder of the first stripe are maintained until the first stripe is reclaimed. Other methods and devices are also disclosed.
Abstract:
The bits of a data block are logically partitioned into an array that includes a number of columns equal to a number of memory devices and a number of rows equal to a number of bits of the data block stored in each memory device. Each memory device contributes one bit to each row. In one embodiment, the bits from a memory device are stored in the same column position of all the rows. One check bit is associated with each row. The check bit is computed by taking the parity of the row associated with the check bit and zero or one column. Each column is assigned to at least four check bits. If a check bit has a column assigned to it, then the check bit is generated by computing the parity of the associated row and the column assigned to the check bit. Alternatively, if the check bit does not have a column assigned to it, the check bit is generated by computing the parity of the row assigned to the check bit only. Each column is assigned to at least four check bits and is assigned to an even number of check bits.
Abstract:
Error correction circuitry (101) attempts to detect and correct, on-the-fly, erroneous words from RAM (102) within a computer system. Correctable errors are scrubbed without delaying the memory access cycle. The address of the section or row of RAM containing the correctable error is latched (130) for later use by a firmware-implemented interrupt-driven scrub routine (104) that reads and rewrites each word within the indicated memory section, resulting in the erroneous word being corrected on-the-fly and rewritten correctly. If the memory section size exceeds a threshold, the scrub process is divided into smaller subprocesses that are distributed in time using a delayed interrupt mechanism. Subprocess duration is kept short enough to avoid impairing the computer system response time. System management interrupts (120) and firmware (104) make the scrub routine independent of and transparent to the operating systems that may be run on the computer system.
Abstract:
A modularized error correction apparatus for correcting package errors is provided by expanding an N bit single error correction, double error detection code to cover N packages of M bits each, so that the Exclusive-OR of all M bit single bit error syndromes in any given package results in a composite syndrome which is unique for each package. See Fig. 2 for the parity matrix H and the matching matrix M for the error correction code.
Abstract:
A method begins by a first device obtaining data for transmission to a second device and partitioning the data to produce a plurality of data portions. The method continues with the first device dispersed storage error encoding the plurality of data portions using a plurality of sets of error coding dispersal storage function parameters to produce a plurality of sets of encoded data slices and transmitting the plurality of sets of encoded data slices to the second device via a network. The method continues with a second device receiving at least a decode threshold number of encoded data slices and dispersed storage error decoding the at least a decode threshold number of encoded data slices to produce a decoded data portion for each set of the plurality of sets of encoded data slices. The method continues with the second device recapturing the data from a plurality of decoded data portions.