-
公开(公告)号:US20110320914A1
公开(公告)日:2011-12-29
申请号:US12822503
申请日:2010-06-24
申请人: Luiz C. Alves , Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Eldee Stephens
发明人: Luiz C. Alves , Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Eldee Stephens
IPC分类号: G06F11/10
CPC分类号: G06F11/1004 , G06F11/108
摘要: Error correction and detection in a redundant memory system that includes a memory controller; a plurality of memory channels in communication with the memory controller, the memory channels including a plurality of memory devices; a cyclical redundancy code (CRC) mechanism for detecting that one of the memory channels has failed, and for marking the memory channel as a failing memory channel; and an error correction code (ECC) mechanism. The ECC is configured for ignoring the marked memory channel and for detecting and correcting additional memory device failures on memory devices located on one or more of the other memory channels, thereby allowing the memory system to continue to run unimpaired in the presence of the memory channel failure.
摘要翻译: 在包括存储器控制器的冗余存储器系统中的错误校正和检测; 与存储器控制器通信的多个存储器通道,存储器通道包括多个存储器件; 用于检测存储器通道之一的循环冗余码(CRC)机制已经失败,并用于将存储器通道标记为故障存储器通道; 和纠错码(ECC)机制。 ECC被配置为忽略标记的存储器通道并且用于检测和校正位于一个或多个其它存储器通道上的存储器设备上的附加存储器件故障,从而允许存储器系统在存在存储器通道的情况下继续运行不受损害 失败。
-
公开(公告)号:US08484529B2
公开(公告)日:2013-07-09
申请号:US12822503
申请日:2010-06-24
IPC分类号: G06F11/00
CPC分类号: G06F11/1004 , G06F11/108
摘要: Error correction and detection in a redundant memory system that includes a memory controller; a plurality of memory channels in communication with the memory controller, the memory channels including a plurality of memory devices; a cyclical redundancy code (CRC) mechanism for detecting that one of the memory channels has failed, and for marking the memory channel as a failing memory channel; and an error correction code (ECC) mechanism. The ECC is configured for ignoring the marked memory channel and for detecting and correcting additional memory device failures on memory devices located on one or more of the other memory channels, thereby allowing the memory system to continue to run unimpaired in the presence of the memory channel failure.
摘要翻译: 在包括存储器控制器的冗余存储器系统中的错误校正和检测; 与存储器控制器通信的多个存储器通道,存储器通道包括多个存储器件; 用于检测存储器通道之一的循环冗余码(CRC)机制已经失败,并用于将存储器通道标记为故障存储器通道; 和纠错码(ECC)机制。 ECC被配置为忽略标记的存储器通道并且用于检测和校正位于一个或多个其它存储器通道上的存储器设备上的附加存储器件故障,从而允许存储器系统在存在存储器通道的情况下继续运行不受损害 失败。
-
公开(公告)号:US20110320869A1
公开(公告)日:2011-12-29
申请号:US12822964
申请日:2010-06-24
申请人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
发明人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
CPC分类号: G06F11/1666 , G06F11/1044 , G06F11/108 , G06F11/141 , G06F11/1604 , G06F11/20 , G06F11/2007 , G06F2211/1088
摘要: Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.
摘要翻译: 在包括存储器控制器,与存储器控制器通信的多个存储器通道,用于检测故障存储器通道的错误检测代码机构和错误恢复机制的冗余存储器系统中提供均匀恢复。 错误恢复机制被配置为用于接收故障存储器通道的通知,用于阻止新的操作在存储器通道上启动,以完成存储器通道上的任何未决操作,用于在存储器通道上执行恢复操作并启动 至少在存储器通道的第一子集上进行新的操作。 存储器系统能够与存储器通道的第一子集一起操作。
-
公开(公告)号:US20110320864A1
公开(公告)日:2011-12-29
申请号:US12822968
申请日:2010-06-24
申请人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
发明人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
CPC分类号: G06F11/2053 , G06F11/073 , G06F11/0793 , G06F11/1004 , G06F11/108 , G06F11/141 , G06F11/1604 , G06F11/1666 , G06F11/20 , G06F11/2007 , G06F2211/1088
摘要: Providing heterogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for performing a recovery operation on the failing memory channel while other memory channels are performing normal system operations, for bringing the recovered channel back into operational mode with the other memory channels for store operations, for continuing to mark the recovered channel to guard against stale data, for removing any stale data after the recovery operation is complete, and for removing the mark on the recovered channel to allow the normal system operations with all of the memory channels, the removing in response to the removing any stale data being complete.
摘要翻译: 在包括存储器控制器,与存储器控制器通信的多个存储器通道,配置用于检测故障存储器通道的错误检测代码机构和错误恢复机制的冗余存储器系统中提供异构恢复。 错误恢复机制被配置为用于接收故障存储器通道的通知,用于在其他存储器通道执行正常的系统操作时对故障存储器通道执行恢复操作,以使恢复的通道与其它存储器通道重新进入操作模式, 存储操作,用于继续标记恢复的通道以防止陈旧的数据,用于在恢复操作完成之后去除任何陈旧的数据,以及用于去除恢复的通道上的标记,以允许所有存储器通道的正常系统操作, 删除,以响应删除任何陈旧的数据完成。
-
公开(公告)号:US08898511B2
公开(公告)日:2014-11-25
申请号:US12822964
申请日:2010-06-24
申请人: Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens , Lisa C. Gower
发明人: Kevin C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
CPC分类号: G06F11/1666 , G06F11/1044 , G06F11/108 , G06F11/141 , G06F11/1604 , G06F11/20 , G06F11/2007 , G06F2211/1088
摘要: Providing homogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for blocking off new operations from starting on the memory channels, for completing any pending operations on the memory channels, for performing a recovery operation on the memory channels and for starting the new operations on at least a first subset of the memory channels. The memory system is capable of operating with the first subset of the memory channels.
摘要翻译: 在包括存储器控制器,与存储器控制器通信的多个存储器通道,用于检测故障存储器通道的错误检测代码机构和错误恢复机制的冗余存储器系统中提供均匀恢复。 错误恢复机制被配置为用于接收故障存储器通道的通知,用于阻止新的操作在存储器通道上启动,以完成存储器通道上的任何未决操作,用于在存储器通道上执行恢复操作并启动 至少在存储器通道的第一子集上进行新的操作。 存储器系统能够与存储器通道的第一子集一起操作。
-
公开(公告)号:US08631271B2
公开(公告)日:2014-01-14
申请号:US12822968
申请日:2010-06-24
申请人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
发明人: Kevin C. Gower , Lisa C. Gower , Luis A. Lastras-Montano , Patrick J. Meaney , Vesselina K. Papazova , Eldee Stephens
IPC分类号: G06F11/00
CPC分类号: G06F11/2053 , G06F11/073 , G06F11/0793 , G06F11/1004 , G06F11/108 , G06F11/141 , G06F11/1604 , G06F11/1666 , G06F11/20 , G06F11/2007 , G06F2211/1088
摘要: Providing heterogeneous recovery in a redundant memory system that includes a memory controller, a plurality of memory channels in communication with the memory controller, an error detection code mechanism configured for detecting a failing memory channel, and an error recovery mechanism. The error recovery mechanism is configured for receiving notification of the failing memory channel, for performing a recovery operation on the failing memory channel while other memory channels are performing normal system operations, for bringing the recovered channel back into operational mode with the other memory channels for store operations, for continuing to mark the recovered channel to guard against stale data, for removing any stale data after the recovery operation is complete, and for removing the mark on the recovered channel to allow the normal system operations with all of the memory channels, the removing in response to the removing any stale data being complete.
摘要翻译: 在包括存储器控制器,与存储器控制器通信的多个存储器通道,配置用于检测故障存储器通道的错误检测代码机构和错误恢复机制的冗余存储器系统中提供异构恢复。 错误恢复机制被配置为用于接收故障存储器通道的通知,用于在其他存储器通道执行正常的系统操作时对故障存储器通道执行恢复操作,以使恢复的通道与其它存储器通道重新进入操作模式, 存储操作,用于继续标记恢复的通道以防止陈旧的数据,用于在恢复操作完成之后去除任何陈旧的数据,以及用于去除恢复的通道上的标记,以允许所有存储器通道的正常系统操作, 删除,以响应删除任何陈旧的数据完成。
-
公开(公告)号:US08566682B2
公开(公告)日:2013-10-22
申请号:US12822498
申请日:2010-06-24
IPC分类号: H03M13/00
CPC分类号: G06F11/10 , H03M13/09 , H04L1/0061 , H04L1/24 , H04L2001/0094
摘要: Failing bus lane detection using syndrome analysis, including a method for receiving a plurality of syndromes of an error detection code, the error detection code associated with a plurality of frames that have been transmitted on a bus that includes a plurality of lanes and is protected by the error detection code. The method includes performing for each of the lanes in each of the syndromes: decoding the syndrome under an assumption that the lane is a failing lane, the decoding outputting a decode result; determining if the decode result is a valid decode; and voting for the lane in response to determining that the decode result is a valid decode. A failing lane is then identified in response to the voting, with the failing lane being characterized by having more votes than at least one other lane on the bus.
摘要翻译: 包括用于接收错误检测码的多个综合征的方法,所述错误检测码与已经在包括多个车道的总线上发送并被由多个车道保护的多个帧相关联的错误检测码 错误检测码。 该方法包括对每个综合征中的每个通道执行:在所述通道是故障通道的假设下解码所述综合征,所述解码输出解码结果; 确定解码结果是否是有效的解码; 并且响应于确定解码结果是有效解码而对该通道进行投票。 然后,响应于投票,确定失败的车道,失败的车道的特征在于比公车上的至少另一个车道具有更多的票数。
-
公开(公告)号:US20110320921A1
公开(公告)日:2011-12-29
申请号:US12822498
申请日:2010-06-24
IPC分类号: G06F11/07
CPC分类号: G06F11/10 , H03M13/09 , H04L1/0061 , H04L1/24 , H04L2001/0094
摘要: Failing bus lane detection using syndrome analysis, including a method for receiving a plurality of syndromes of an error detection code, the error detection code associated with a plurality of frames that have been transmitted on a bus that includes a plurality of lanes and is protected by the error detection code. The method includes performing for each of the lanes in each of the syndromes: decoding the syndrome under an assumption that the lane is a failing lane, the decoding outputting a decode result; determining if the decode result is a valid decode; and voting for the lane in response to determining that the decode result is a valid decode. A failing lane is then identified in response to the voting, with the failing lane being characterized by having more votes than at least one other lane on the bus.
摘要翻译: 包括用于接收错误检测码的多个综合征的方法,所述错误检测码与已经在包括多个车道的总线上发送并被由多个车道保护的多个帧相关联的错误检测码 错误检测码。 该方法包括对每个综合征中的每个通道执行:在所述通道是故障通道的假设下解码所述综合征,所述解码输出解码结果; 确定解码结果是否是有效的解码; 并且响应于确定解码结果是有效解码而对该通道进行投票。 然后,响应于投票,确定失败的车道,失败的车道的特征在于比公车上的至少另一个车道具有更多的票数。
-
9.
公开(公告)号:US08041989B2
公开(公告)日:2011-10-18
申请号:US11769936
申请日:2007-06-28
申请人: Luis A. Lastras-Montano , James A. O'Connor , Luiz C. Alves , William J. Clarke , Timothy J. Dell , Thomas J. Dewkett , Kevin C. Gower
发明人: Luis A. Lastras-Montano , James A. O'Connor , Luiz C. Alves , William J. Clarke , Timothy J. Dell , Thomas J. Dewkett , Kevin C. Gower
IPC分类号: G06F11/00
CPC分类号: G06F11/1044 , G11C5/04 , G11C29/4401 , G11C29/81 , G11C2029/0409 , G11C2029/0411
摘要: A system and method for providing a high fault tolerant memory system. The system includes a memory system having a memory controller, a plurality of memory modules and a mechanism. The plurality of memory modules are in communication with the memory controller and with a plurality of memory devices. The plurality of memory devices include at least one spare memory device for providing memory device sparing capability. The mechanism is for detecting that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the possible memory device failure.
摘要翻译: 一种用于提供高容错存储器系统的系统和方法。 该系统包括具有存储器控制器,多个存储器模块和机构的存储器系统。 多个存储器模块与存储器控制器和多个存储器件通信。 多个存储器件包括用于提供存储器件备用能力的至少一个备用存储器件。 该机制用于检测存储器模块中的一个已经失败,可能与另一个存储器模块上的存储器件故障重合。 该机制允许存储器系统在存在存储器模块故障和可能的存储器件故障的情况下继续运行不受损害。
-
10.
公开(公告)号:US20090006900A1
公开(公告)日:2009-01-01
申请号:US11769936
申请日:2007-06-28
申请人: Luis A. Lastras-Montano , James A. O'Connor , Luiz C. Alves , William J. Clarke , Timothy J. Dell , Thomas J. Dewkett , Kevin C. Gower
发明人: Luis A. Lastras-Montano , James A. O'Connor , Luiz C. Alves , William J. Clarke , Timothy J. Dell , Thomas J. Dewkett , Kevin C. Gower
IPC分类号: G06F11/00
CPC分类号: G06F11/1044 , G11C5/04 , G11C29/4401 , G11C29/81 , G11C2029/0409 , G11C2029/0411
摘要: A system and method for providing a high fault tolerant memory system. The system includes a memory system having a memory controller, a plurality of memory modules and a mechanism. The plurality of memory modules are in communication with the memory controller and with a plurality of memory devices. The plurality of memory devices include at least one spare memory device for providing memory device sparing capability. The mechanism is for detecting that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the possible memory device failure.
摘要翻译: 一种用于提供高容错存储器系统的系统和方法。 该系统包括具有存储器控制器,多个存储器模块和机构的存储器系统。 多个存储器模块与存储器控制器和多个存储器件通信。 多个存储器件包括用于提供存储器件备用能力的至少一个备用存储器件。 该机制用于检测存储器模块中的一个已经失败,可能与另一个存储器模块上的存储器件故障重合。 该机制允许存储器系统在存在存储器模块故障和可能的存储器件故障的情况下继续运行不受损害。
-
-
-
-
-
-
-
-
-