摘要:
Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.
摘要:
Providing heterogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for performing a recovery operation on the failing memory channel while other memory channels are performing normal system operations, for bringing the recovered channel back into operational mode with the other memory channels for store operations, for continuing to mark the recovered channel to guard against stale data, for removing any stale data after the recovery operation is complete, and for removing the mark on the recovered channel to allow the normal system operations with all of the memory channels, the removing in response to the removing any stale data being complete.
摘要:
A memory system with high availability is provided. The memory system includes multiple memory channels. Each memory channel includes at least one memory module with memory devices organized as partial ranks coupled to memory device bus segments. Each partial rank includes a subset of the memory devices accessible as a subchannel on a subset of the memory device bus segments. The memory system also includes a memory controller in communication with the multiple memory channels. The memory controller distributes an access request across the memory channels to access a full rank. The full rank includes at least two of the partial ranks on separate memory channels. Partial ranks on a common memory module can be concurrently accessed. The memory modules can use at least one checksum memory device as a dedicated checksum memory device or a shared checksum memory device between at least two of the concurrently accessible partial ranks.
摘要:
A method for error detection in a memory system. The method includes calculating one or more signatures associated with data that contains an error. It is determined if the error is a potential correctable error. If the error is a potential correctable error, then the calculated signatures are compared to one or more signatures in a trapping set. The trapping set includes signatures associated with uncorrectable errors. An uncorrectable error flag is set in response to determining that at least one of the calculated signatures is equal to a signature in the trapping set.
摘要:
A system to improve memory failure management may include memory, and an error control decoder to determine failures in the memory. The system may also include an agent that may monitor failures in the memory. The system may further include a table where the error control decoder may record the failures, and where the agent can read and write to.
摘要:
A system to improve error code decoding using historical information may include storage partitioned into memory ranks, and a table to record symbols having failures for each memory rank. The system may also generate a memory rank score for each memory rank. The system may also include an error control decoder that may use the memory rank score when each memory rank is accessed in order to determine whether an error should be corrected or not.
摘要:
A computer memory system having a three-level memory hierarchy structure is disclosed. The system includes a memory controller, a volatile memory, and a non-volatile memory. The volatile memory is divided into an uncompressed data region and a compressed data region.
摘要:
A system and method for providing low latency constrained coding for parallel busses. The method includes receiving a value for a number of transfers and a number of possible constrained patterns between adjacent transfer rows. Data to be encoded is received. The data is converted into indices of constrained patterns, the converting including a number base change into a new base. The new base is chosen so as to optimize the number of operations required to perform the converting subject to the new base being at least as large as the number of possible constrained patterns between adjacent transfer rows. The indices of the constrained pattern are converted into encoded data. The encoded data is then output.
摘要:
Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.
摘要:
Providing heterogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for performing a recovery operation on the failing memory channel while other memory channels are performing normal system operations, for bringing the recovered channel back into operational mode with the other memory channels for store operations, for continuing to mark the recovered channel to guard against stale data, for removing any stale data after the recovery operation is complete, and for removing the mark on the recovered channel to allow the normal system operations with all of the memory channels, the removing in response to the removing any stale data being complete.