Abstract:
There is disclosed in one example a multi-core computing system configured to provide a hot-swappable CPU0, including: a first CPU in a first CPU socket and a second CPU in a second CPU socket; a switch including a first media interface to the first CPU socket and a second media interface to the second CPU socket; and one or more mediums including non-transitory instructions to detect a hot swap event of the first CPU, designate the second CPU as CPU0, determine that a new CPU has replaced the first CPU, operate the switch to communicatively couple the new CPU to a backup initialization code store via the first media interface, initialize the new CPU, and designate the new CPU as CPUN, wherein N≠0.
Abstract:
An example apparatus to monitor memory includes an error manager to compare a first memory location of a first error in the memory to a plurality of memory locations in an error history log, the plurality of memory locations previously identified in the error history log based on errors detected in the memory locations, ones of the memory locations associated with corresponding counters that track the errors detected in the memory locations, and update a first one of the counters corresponding to the first memory location when a first address of the first memory location matches a second address of one of the memory locations in the error history log. The example apparatus further includes a command generator to transmit a command to an error corrector to perform error correction on the first memory location when the first one of the counters satisfies a threshold.
Abstract:
There is disclosed in one example a multi-core computing system configured to provide a hot-swappable CPU0, including: a first CPU in a first CPU socket and a second CPU in a second CPU socket; a switch including a first media interface to the first CPU socket and a second media interface to the second CPU socket; and one or more mediums including non-transitory instructions to detect a hot swap event of the first CPU, designate the second CPU as CPU0, determine that a new CPU has replaced the first CPU, operate the switch to communicatively couple the new CPU to a backup initialization code store via the first media interface, initialize the new CPU, and designate the new CPU as CPUN, wherein N≠0.
Abstract:
Techniques and mechanisms for providing error detection and correction for a platform comprising a memory including one or more spare memory segments. In an embodiment, a memory controller performs first scrubbing operations including detection for errors in a plurality of currently active memory segments. Additional patrol scrubbing is performed for one or more memory segments while the memory segments are each available for activation as a replacement memory segment. In another embodiment, a first handler process (but not a second handler process) is signaled if an uncorrectable error event is detected based on the active segment scrubbing, whereas the second handler process (but not the first handler process) is signaled if an uncorrectable error event is detected based on the spare segment scrubbing. Of the first handler process and the second handler process, only signaling of the first handler process results in a crash event of the platform.
Abstract:
Systems, apparatuses and methods may provide for technology that handles failures in memory hardware (e.g., dynamic random access memory (DRAM)) via runtime post package repair. Such technology may include operations to perform a runtime post package repair in response to a memory hardware failure detected in the memory. In such an example, the runtime post package repair may be done after power up boot operations have been completed.
Abstract:
An example apparatus to monitor memory includes an error manager to compare a first memory location of a first error in the memory to a plurality of memory locations in an error history log, the plurality of memory locations previously identified in the error history log based on errors detected in the memory locations, ones of the memory locations associated with corresponding counters that track the errors detected in the memory locations, and update a first one of the counters corresponding to the first memory location when a first address of the first memory location matches a second address of one of the memory locations in the error history log. The example apparatus further includes a command generator to transmit a command to an error corrector to perform error correction on the first memory location when the first one of the counters satisfies a threshold.
Abstract:
Techniques and mechanisms for providing error detection and correction for a platform comprising a memory including one or more spare memory segments. In an embodiment, a memory controller performs first scrubbing operations including detection for errors in a plurality of currently active memory segments. Additional patrol scrubbing is performed for one or more memory segments while the memory segments are each available for activation as a replacement memory segment. In another embodiment, a first handler process (but not a second handler process) is signaled if an uncorrectable error event is detected based on the active segment scrubbing, whereas the second handler process (but not the first handler process) is signaled if an uncorrectable error event is detected based on the spare segment scrubbing. Of the first handler process and the second handler process, only signaling of the first handler process results in a crash event of the platform.