摘要:
A system to improve miscorrection rates in error control code may include an error control decoder with a safe decoding mode that processes at least two data packets. The system may also include a buffer to receive the processed at least two data packets from the error control decoder. The error control decoder may apply a logic OR operation to the uncorrectable error signal related to the processing of the at least two data packets to produce a global uncorrectable error signal. The system may further include a recipient to receive the at least two data packets and the global uncorrectable error signal.
摘要:
A system to improve miscorrection rates in error control code may include an error control decoder with a safe decoding mode that processes at least two data packets. The system may also include a buffer to receive the processed at least two data packets from the error control decoder. The error control decoder may apply a logic OR operation to the uncorrectable error signal related to the processing of the at least two data packets to produce a global uncorrectable error signal. The system may further include a recipient to receive the at least two data packets and the global uncorrectable error signal.
摘要:
A system to improve error code decoding using historical information. An example system includes storage partitioned into memory ranks, and a table to record symbols having failures for each memory rank. The system generates a memory rank score for each memory rank. The system also includes an error control decoder that uses the memory rank score when each memory rank is accessed in order to determine whether an error should be corrected or not.
摘要:
A system to improve memory reliability in computer systems that may include memory chips, and may rely on a error control encoder to send codeword symbols for storage in each of the memory chips. At least two symbols from a codeword are assigned to each memory chip and therefore failure of any of the memory chips could affect two symbols or more. The system may also include a table to record failures and partial failures of the codeword symbols for each of the memory chips so the error control encoder can correct subsequent partial failures based upon the previous partial failures. The error control coder is capable of correcting and/or detecting more errors if only a fraction of a chip is noted in the table as having a failure as opposed to a full chip noted as having a failure.
摘要:
A system to improve error code decoding using historical information may include storage partitioned into memory ranks, and a table to record symbols having failures for each memory rank. The system may also generate a memory rank score for each memory rank. The system may also include an error control decoder that may use the memory rank score when each memory rank is accessed in order to determine whether an error should be corrected or not.
摘要:
A system to improve memory reliability in computer systems that may include memory chips, and may rely on a error control encoder to send codeword symbols for storage in each of the memory chips. At least two symbols from a codeword are assigned to each memory chip and therefore failure of any of the memory chips could affect two symbols or more. The system may also include a table to record failures and partial failures of the codeword symbols for each of the memory chips so the error control encoder can correct subsequent partial failures based upon the previous partial failures. The error control coder is capable of correcting and/or detecting more errors if only a fraction of a chip is noted in the table as having a failure as opposed to a full chip noted as having a failure.
摘要:
A hub device, memory system, and method for providing a cascade interconnect memory system with enhanced reliability. The hub device includes an interface to a high-speed bus for communicating with a memory controller. The memory controller and the hub device are included in a cascade interconnect memory system and the high-speed bus includes bit lanes and one or more clock lanes. The hub device also includes a bi-directional fault signal line in communication with the memory controller and readable by a service interface. The hub device also includes a fault isolation register (FIR) for storing information about failures detected at the hub device, the information including severity levels of the detected failures. In addition, the hub device includes error recovery logic for responding to a failure detected at the hub device.
摘要:
A hub device, memory system, and method for providing a cascade interconnect memory system with enhanced reliability. The hub device includes an interface to a high-speed bus for communicating with a memory controller. The memory controller and the hub device are included in a cascade interconnect memory system and the high-speed bus includes bit lanes and one or more clock lanes. The hub device also includes a bi-directional fault signal line in communication with the memory controller and readable by a service interface. The hub device also includes a fault isolation register (FIR) for storing information about failures detected at the hub device, the information including severity levels of the detected failures. In addition, the hub device includes error recovery logic for responding to a failure detected at the hub device. Responding to the error includes recording a severity level of the failure in the FIR and taking an action at the hub device that is responsive to the severity level of the failure. The action includes one or more of fast clock stop, setting the bi-directional fault indicator, setting cyclical redundancy code (CRC) bits and transmitting them to the memory controller, re-try, sparing out a bit lane and sparing out a clock lane.
摘要:
A system and method for error correction and detection in a memory system. The system includes a memory controller, a plurality of memory modules and a mechanism. The memory modules are in communication with the memory controller and with a plurality of memory devices. The mechanism detects that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the memory device failure.
摘要:
A system and method for providing a high fault tolerant memory system. The system includes a memory system having a memory controller, a plurality of memory modules and a mechanism. The plurality of memory modules are in communication with the memory controller and with a plurality of memory devices. The plurality of memory devices include at least one spare memory device for providing memory device sparing capability. The mechanism is for detecting that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the possible memory device failure.