摘要:
A method, computer program product and system for switching a defective signal line with a spare signal line without shutting down the computer system. A service processor monitors error correction code (ECC) check units configured to detect an error in a signal line. If an ECC check unit detects an error rate that exceeds a threshold, then the signal line with such an error rate may be said to be “defective.” The service processor configures switch control units in the driver/receiver pair associated with the defective signal line to be able to switch the defective signal line with a spare line upon receipt of a command from a memory controller switch control unit. In this manner, the system is not deactivated in order to switch a defective signal line with a spare line thereby reducing the time that the processor cannot send information to the memory buffers.
摘要:
A method, computer program product and system for switching a defective signal line with a spare signal line without shutting down the computer system. A service processor monitors error correction code (ECC) check units configured to detect an error in a signal line. If an ECC check unit detects an error rate that exceeds a threshold, then the signal line with such an error rate may be said to be “defective.” The service processor configures switch control units in the driver/receiver pair associated with the defective signal line to be able to switch the defective signal line with a spare line upon receipt of a command from a memory controller switch control unit. In this manner, the system is not deactivated in order to switch a defective signal line with a spare line thereby reducing the time that the processor cannot send information to the memory buffers.
摘要:
A method, computer program product and system for switching a defective signal line with a spare signal line without shutting down the computer system. A service processor monitors error correction code (ECC) check units configured to detect an error in a signal line. If an ECC check unit detects an error rate that exceeds a threshold, then the signal line with such an error rate may be said to be “defective.” The service processor configures switch control units in the driver/receiver pair associated with the defective signal line to be able to switch the defective signal line with a spare line upon receipt of a command from a memory controller switch control unit. In this manner, the system is not deactivated in order to switch a defective signal line with a spare line thereby reducing the time that the processor cannot send information to the memory buffers.
摘要:
A memory controller for a processing unit provides a memory wrap test mode path which selectively writes data from the write buffer of the controller to the read buffer of the controller, thereby allowing the write and read buffers to substitute for a system memory device during testing of the processing unit. The processing unit can thus be tested without the attached memory device yet still operate under conditions which generate bus traffic and chip noise similar to that generated under actual (end-use) operation. When a processor issues a write operation in test mode, the controller writes the data to an entry of the read buffer which corresponds to the write address. Thereafter, the processor can issue a read operation with the same address and the read buffer will send the data from the corresponding entry.
摘要:
A memory controller for a processing unit provides a memory wrap test mode path which selectively writes data from the write buffer of the controller to the read buffer of the controller, thereby allowing the write and read buffers to substitute for a system memory device during testing of the processing unit. The processing unit can thus be tested without the attached memory device yet still operate under conditions which generate bus traffic and chip noise similar to that generated under actual (end-use) operation. When a processor issues a write operation in test mode, the controller writes the data to an entry of the read buffer which corresponds to the write address. Thereafter, the processor can issue a read operation with the same address and the read buffer will send the data from the corresponding entry.
摘要:
A method, system and computer program product are provided for implementing command timing adjustments to alleviate Dynamic Random Access Memory (DRAM) failures in a computer system. A predefined DRAM failure is detected. Responsive to the detected failure, a set of timers is adjusted for controlling predetermined timings used to access the DRAM. Responsive to the failure being resolved by the adjusted set of timers, checking for a predetermined level of performance is performed.
摘要:
A method, system and computer program product are provided for implementing hardware assisted Dynamic Random Access Memory (DRAM) repair in a computer system that supports ECC. A data register providing DRAM repair is selectively provided in one of the Dynamic Random Access Memory (DRAM), a memory controller, or a memory buffer coupled between the DRAM and the memory controller. The data register is configured to map to any address. Responsive to the configured address being detected, the reads to or the writes from the configured address are routed to the data register.
摘要:
A method, system and computer program product are provided for implementing command timing adjustments to alleviate Dynamic Random Access Memory (DRAM) failures in a computer system. A predefined DRAM failure is detected. Responsive to the detected failure, a set of timers is adjusted for controlling predetermined timings used to access the DRAM. Responsive to the failure being resolved by the adjusted set of timers, checking for a predetermined level of performance is performed.
摘要:
A method and apparatus for continued operation of a memory module, including a first and second memory device, when one of memory devices has failed. The method includes receiving a write operation request to write a data word, having first and second sections, by a first memory module. The memory module may have a first memory device and a second memory device, for respectively storing the first and second sections of the data word. A determination if one of the first and second memory devices is inoperable is made. If one of the first and second memory devices is inoperable, a write operation is performed by writing the first and second sections of the data word to the operable one of the first and second memory devices.
摘要:
A method, system and computer program product implement memory performance management and enhanced memory reliability of a computer system accounting for system thermal conditions. When a primary memory temperature reaches an initial temperature threshold, reads are suspended to the primary memory and reads are provided to a mirrored memory in a mirrored memory pair, and writes are provided to both the primary memory and the mirrored memory. If the primary memory temperature reaches a second temperature threshold, write operations to the primary memory are also stopped and the primary memory is turned off with DRAM power saving modes such as self timed refresh (STR), and the reads and writes are limited to the mirrored memory in the mirrored memory pair. When the primary memory temperature decreases to below the initial temperature threshold, coherency is recovered by writing a coherent copy from the mirrored memory to the primary memory.