PRESERVING ERROR CONTEXT DURING A REBOOT OF A COMPUTING DEVICE

    公开(公告)号:US20220318093A1

    公开(公告)日:2022-10-06

    申请号:US17221751

    申请日:2021-04-02

    IPC分类号: G06F11/14 G06F11/07

    摘要: To preserve error context during a reboot of a computing device, firmware within the computing device can be configured to implement a method that includes determining where the error context is stored in volatile memory. The method can also include identifying a plurality of recorder regions in non-volatile memory that have been assigned to store the error context. The plurality of recorder regions can be disaggregated across a plurality of distinct non-volatile memory regions. The method can also include flushing the error context from a plurality of different volatile memory locations to the plurality of recorder regions in response to detecting a trigger. The flushing can occur prior to the reboot of the computing device. The method can also include restoring at least some of the error context to the volatile memory after the computing device has been rebooted.

    KERNEL SOFT RESET USING NON-VOLATILE RAM
    5.
    发明申请

    公开(公告)号:US20180165101A1

    公开(公告)日:2018-06-14

    申请号:US15378406

    申请日:2016-12-14

    IPC分类号: G06F9/44

    摘要: Technologies are described which permit kernel updates or firmware fixes, and include re-initialization of kernel data structures, without losing user context information that has been created by services, virtual machines, or user applications. Tailored code in a server or other computing system sets a kernel soft reset (KSR) indicator and saves the user context to non-volatile storage. When a KSR is underway, boot code skips the power on self-test and similar initializations (thereby reducing downtime), loads a kernel image, initializes kernel data structures, restores the user context, and passes control to the initialized kernel to continue computing system operation with the same user context. Device drivers may also be re-initialized. The loaded kernel may use newly fixed firmware, or may have a security patch installed, for instance. The non-volatile storage may operate at RAM speed, e.g., it may include NVDIMM memory. The kernel may be validated before receiving control.

    TARGETED REPAIR OF HARDWARE COMPONENTS IN A COMPUTING DEVICE

    公开(公告)号:US20210311833A1

    公开(公告)日:2021-10-07

    申请号:US16837885

    申请日:2020-04-01

    摘要: A method for targeted repair of a hardware component in a computing device that is part of a cloud computing system includes monitoring a plurality of hardware components in the computing device. At some point, a defective sub-component within the hardware component of the computing device is identified. In addition to the defective sub-component, the hardware component also includes at least one sub-component that is functioning properly and a spare component that can be used in place of the defective sub-component. The method also includes initiating a targeted repair action while the computing device is connected to the cloud computing system. The targeted repair action prevents the defective sub-component from being used by the computing device without preventing sub-components that are functioning properly from being used by the computing device. The targeted repair action causes the spare component to be used in place of the defective sub-component.

    SYSTEM FOR CONFIGURABLE ERROR HANDLING
    8.
    发明申请

    公开(公告)号:US20200151048A1

    公开(公告)日:2020-05-14

    申请号:US16184003

    申请日:2018-11-08

    IPC分类号: G06F11/07 G06F13/40 G06F13/42

    摘要: An error-handling system provides detection of an error on an I/O hardware endpoint, triggering of an operating system interrupt in response to detected error, reception of the interrupt at an operating system component, determination, in response to the received interrupt, whether to handle the error using an operating system handler or a firmware error handler associated with the I/O hardware endpoint, and, if it is determined to handle the error using a firmware runtime error handler associated with the I/O hardware endpoint, triggering of a firmware interrupt associated with the firmware runtime error handler.