Abstract:
A method for operating a memory system comprises receiving a write request associate with data, decoding an address of the write request, receiving thermal data indicating a temperature at the address of the write request, determining whether the temperature is above a threshold temperature, and writing the data to the address responsive to determining that the temperature is not above the threshold temperature.
Abstract:
An endurance parameter value of a non-volatile memory included in a non-volatile dual in-line memory module (NVDIMM) can be monitored and compared against a warning threshold value. In response to the endurance parameter exceeding the warning threshold value, a system alert can be generated, within a host system of the NVDIMM, to inform a system user that the NVDIMM is approaching its end-of-life. If the endurance parameter exceeds a replacement threshold value greater than the warning threshold value, an upgrade process can be initiated. The upgrade process can include copying data from the first non-volatile memory to a volatile memory of the NVDIMM and copying, in response to the first non-volatile memory being replaced with a second non-volatile memory, the data from the volatile memory to the second non-volatile memory.
Abstract:
A method for mirroring in three-dimensional-stacked memory includes receiving a plurality of thermal profiles from a plurality of memory chips. The method also includes ranking the plurality of memory chips in a first ranked list of memory chips as a function of the plurality of thermal profiles and forming a first group of memory chips from the plurality of memory chips based on the first ranked list of memory chips. The method also includes forming a second group of memory chips from the plurality of memory chips distinct from the first group of memory chips based on the first ranked list of memory chips. The method also includes pairing a first memory chip from the first group of memory chips and a second memory chip from the second group of memory chips, and mirroring the pairing of memory chips.
Abstract:
According to one aspect, a method for performance optimization of read functions in a memory system includes receiving, at the memory system, a read request including a logical address of a target data. The memory system includes a primary memory and a back-up memory that minors the primary memory. The method also includes searching a fault monitor table for an entry corresponding to the received logical address. The fault monitor table includes a plurality of entries that indicate physical locations of identified memory failure events in the primary memory and the back-up memory. Based on locating an entry corresponding to the received logical address, the method further includes selecting one of the primary memory and the backup memory for retrieving the target data. The selection is based on contents of the fault monitor table.
Abstract:
A method, system and computer program product are provided for implementing enhanced reliability of memory subsystems utilizing a dual port Dynamic Random Access Memory (DRAM) configuration. The DRAM configuration includes a first buffer and a second buffer, each buffer including a validity counter. The validity counter for a receiving buffer is incremented as each respective data row from a transferring buffer is validated through Error Correction Code (ECC), Reliability, Availability, and Serviceability (RAS) logic and transferred to the receiving buffer, while the validity counter for the transferring buffer is decremented. Data are read from or written to either the first buffer or the second buffer based upon a respective count value of the validity counters.
Abstract:
Apparatus and methods are disclosed that enable the allocation of a cache portion of a memory buffer to be utilized by an on-cache function controller (OFC) to execute processing functions on “main line” data. A particular method may include receiving, at a memory buffer, a request from a memory controller for allocation of a cache portion of the memory buffer. The method may also include acquiring, by an on-cache function controller (OFC) of the memory buffer, the requested cache portion of the memory buffer. The method may further include executing, by the OFC, a processing function on data stored at the cache portion of the memory buffer.
Abstract:
A method for testing a stacked memory device having a plurality of memory chips connected to and arranged on top of a logic chip for a connection defect is disclosed. The method may include testing a memory chip by writing a data value into a first location in the memory chip, reading a data value from the first location, detecting a first bit error and recording a bit number of the first bit error. The method may also include testing the memory chip by writing a data value into a second location in the memory chip, reading a data value from the second location in the memory chip, detecting a second bit error and recording a bit number of the second bit error. The method may also include replacing a connection common to the first and second bit errors with a spare connection.
Abstract:
A method and apparatus for continued operation of a memory module, including a first and second memory device, when one of memory devices has failed. The method includes receiving a write operation request to write a data word, having first and second sections, by a first memory module. The memory module may have a first memory device and a second memory device, for respectively storing the first and second sections of the data word. A determination if one of the first and second memory devices is inoperable is made. If one of the first and second memory devices is inoperable, a write operation is performed by writing the first and second sections of the data word to the operable one of the first and second memory devices.
Abstract:
A method, system and computer program product are provided for implementing ECC (Error Correction Codes) redundancy using reconfigurable logic blocks in a computer system. When a fail is detected when reading from memory, it is determined if the incorrect data is in the data or the ECC component of the data. When incorrect data is found in the ECC component of the data, and an actionable threshold is not reached, a predetermined Reliability, Availability, and Serviceability (RAS) action is taken. When the actionable threshold is reached with incorrect data identified in the ECC component of the data, an analysis process is performed to determine if the ECC logic is faulty. When a fail in the ECC logic is detected, the identified ECC failed logic is replaced with a spare block of logic.
Abstract:
A method and apparatus for operation of a memory module for storage of a data word is provided. The apparatus includes a memory module having a set of paired memory devices including a first memory device to store a first section of a data word and a second memory device to store a second section of the data word when used in failure free operation. The apparatus may further include a first logic module to perform a write operation by writing the first and second sections of the data word to both the first memory device and the second memory device upon the determination of certain types of failure. The determination may include that a failure exists in the word section storage of either the first or second memory devices but that no failures exist in equivalent locations of word section storage in the two memory devices.