摘要:
Various exemplary embodiments relate to a cache corruption prevention system and a related method. A cache memory may contain contents that are susceptible to corruption. A cache controller, with the use of a threshold timer, may employ various operations to flush modified cache contents into a main memory and invalidate cache contents so that they are overwritten. Some operations include periodically flushing and invalidating the whole cache memory, periodically flushing and invalidating modified contents, and periodically flushing and invalidating contents based on the time saved in the cache memory. By overwriting cache contents that might otherwise be constantly stored in the cache memory, the system minimizes the probability of cache contents becoming corrupt. The periodic updating of the main memory may also increase the probability of successfully recovering from potential cache parity errors while still maintaining high performance associated with using a cache memory.
摘要:
The invention is directed to monitoring execution of software threads, particularly by detecting a lockup or stall in execution of a software thread and initiating a remedial action in response. Advantageously, some embodiments of the invention automatically detect a lockup or stall in execution of a software thread by periodically sampling information corresponding to the thread, and, in accordance with a determination made using the information, initiate an attempt to recover from such a condition in execution without the need for manual intervention.
摘要:
Apparatus and methods for autonomously identifying and mitigating soft-errors affecting integrated circuit memory storage devices are provided. A soft-error mitigation process is invoked upon finding that an integrated circuit memory device is affected by a parity error. In a staged approach, unused memory regions of the integrated circuit memory device are reinitialized; if a redundant deployment prevails, the subsystem corresponding to the affected integrated circuit memory device is reset; memory regions having copies of contents thereof stored at remote locations are rewritten with obtained copies of the contents; and memory regions storing contents which are generated at run-time are reinitialized. Directed parity error scans are employed at each stage. If the parity error persists, one of the apparatus, and the subsystem corresponding to the affected silicon memory device is reset during a maintenance window. Advantages are derived from a run-time soft-error mitigation process which increases availability, and reduces maintenance overheads and the need for hardware replacement.
摘要:
Information error recovery apparatus and methods are disclosed. Responsive to an error detected in information retrieved from an information store for use by a processor in a software execution flow, the software execution flow of the processor is suspended. Use of the information store by the processor is also disabled. The software execution flow of the processor is allowed to resume using information from a further information store in which the retrieved information is also stored. This allows recovery from errors without resetting the processor. The information store may be reloaded from the further information store and re-enabled for use by the processor. The information store and the further information store are a cache and a main memory, respectively, in one embodiment.
摘要:
The invention is directed to monitoring execution of software threads, particularly by detecting a lockup or stall in execution of a software thread and initiating a remedial action in response. Advantageously, some embodiments of the invention automatically detect a lockup or stall in execution of a software thread by periodically sampling information corresponding to the thread, and, in accordance with a determination made using the information, initiate an attempt to recover from such a condition in execution without the need for manual intervention.