Abstract:
In some embodiments, a computer-implemented method includes maintaining two or more error indicators for correctable errors occurring at two or more memory components. Each of the error indicators may be associated with a corresponding memory component. A correctable error may be detected as occurring during a first memory fetch operation at a first memory component. A first error indicator corresponding to the first memory component may be set, responsive to the correctable error at the first memory component. An uncorrectable error may be detected during a second memory fetch operation. It may be detected that the first error indicator is set. The first memory component may be marked, responsive to the uncorrectable error and to detecting that the first error indicator is set. The two or more error indicators for correctable errors may thus determine which memory component to mark due to the uncorrectable error.
Abstract:
A technique is provided for accumulating failures. A failure of a first row is detected in a group of array macros, the first row having first row address values. A mask has mask bits corresponding to each of the first row address values. The mask bits are initially in active status. A failure of a second row, having second row address values, is detected. When none of the first row address values matches the second row address values, and when mask bits are all in the active status, the array macros are determined to be bad. When at least one of the first row address values matches the second row address values, mask bits that correspond to at least one of the first row address values that match are kept in active status, and mask bits that correspond to non-matching first address values are set to inactive status.
Abstract:
Embodiments relate to reestablishing synchronization across multiple channels in a memory system. One aspect is a system that includes a plurality of channels, each providing communication with a memory buffer chip and a plurality of memory devices. A memory control unit is coupled to the plurality of channels. The memory control unit is configured to perform a method that includes receiving an out-of-synchronization indication associated with at least one of the channels. The memory control unit performs a first stage of reestablishing synchronization that includes selectively stopping new traffic on the plurality of channels, waiting for a first time period to expire, resuming traffic on the plurality of channels based on the first time period expiring, and verifying that synchronization is reestablished for a second time period.
Abstract:
Embodiments relate to reestablishing synchronization across multiple channels in a memory system. One aspect is a system that includes a plurality of channels, each providing communication with a memory buffer chip and a plurality of memory devices. A memory control unit is coupled to the plurality of channels. The memory control unit is configured to perform a method that includes receiving an out-of-synchronization indication associated with at least one of the channels. The memory control unit performs a first stage of reestablishing synchronization that includes selectively stopping new traffic on the plurality of channels, waiting for a first time period to expire, resuming traffic on the plurality of channels based on the first time period expiring, and verifying that synchronization is reestablished for a second time period.
Abstract:
Embodiments include a combined rank and linear memory address incrementing utility. An aspect includes an address incrementing utility suitable for implementation within a memory controller as an integrated subsystem of a central processing unit (CPU) chip. In this type of on-chip embodiment, the address incrementing utility utilizes dedicated hardware, chip-resident firmware, and one or more memory address configuration maps to enhance processing speed, efficiency and accuracy. The combined rank and linear memory address incrementing utility is designed to efficiently increment through all of the individual bit addresses for a large logical memory space divided into a number of ranks on a rank-by-rank basis. The address incrementing utility sequentially generates all of the sequential memory addresses for a selected rank, and then moves to the next rank and sequentially generates all of the memory addresses for that rank, and so forth until of the ranks have been processed.
Abstract:
A technique is provided for accumulating failures. A failure of a first row is detected in a group of array macros, the first row having first row address values. A mask has mask bits corresponding to each of the first row address values. The mask bits are initially in active status. A failure of a second row, having second row address values, is detected. When none of the first row address values matches the second row address values, and when mask bits are all in the active status, the array macros are determined to be bad. When at least one of the first row address values matches the second row address values, mask bits that correspond to at least one of the first row address values that match are kept in active status, and mask bits that correspond to non-matching first address values are set to inactive status.
Abstract:
Embodiments include a combined rank and linear memory address incrementing utility. An aspect includes an address incrementing utility suitable for implementation within a memory controller as an integrated subsystem of a central processing unit (CPU) chip. In this type of on-chip embodiment, the address incrementing utility utilizes dedicated hardware, chip-resident firmware, and one or more memory address configuration maps to enhance processing speed, efficiency and accuracy. The combined rank and linear memory address incrementing utility is designed to efficiently increment through all of the individual bit addresses for a large logical memory space divided into a number of ranks on a rank-by-rank basis. The address incrementing utility sequentially generates all of the sequential memory addresses for a selected rank, and then moves to the next rank and sequentially generates all of the memory addresses for that rank, and so forth until of the ranks have been processed.
Abstract:
Embodiments relate to a computer system for bitline deletion, the system including a cache controller and cache. The system is configured to perform a method including detecting a first error when reading a first cache line, recording a first address of the first error, detecting a second error when reading a second cache line, recording a second address of the second error, comparing first and second bitline addresses, comparing the first and second wordline address, activating a bitline delete mode based on matching first and second bitline addresses and not matching first and second wordline addresses, detecting a third error when reading a third cache line, recording a third bitline address of the third error, comparing the second bitline address to the third bitline address and deleting a location corresponding to the third cache line based on the activated bitline delete mode and matching third and second bitline addresses.
Abstract:
Marking memory chips as faulty when a fault is detected in data from the memory chip. Upon detecting that a plurality of memory chips are faulty, determining which of a plurality of memory channels contains the faulty memory chips. Marking one of a plurality of memory channels as failing in response to determining that the number of failing memory chips has exceeded a threshold.
Abstract:
Aspects of the invention include using a cyclic redundancy code (CRC) multiple-input signature register (MISR) for early warning and fail detection. Received bits are monitored at a receiver for transmission errors. The monitoring includes receiving frames of bits that are a subset of frames of bits used by the transmitter to generate a multi-frame CRC. At least one of the received frames of bits includes payload bits and a source single check bit not included in the multi-frame CRC. It is determined whether a transmission error has occurred in the received frames of bits. The determining includes generating a calculated single check bit based at least in part on bits in the received frames of bits, and comparing the received source single check bit to the calculated single check bit. An error indication is transmitted to the transmitter if they don't match.