SERIALIZING MACHINE CHECK EXCEPTIONS FOR PREDICTIVE FAILURE ANALYSIS

    公开(公告)号:US20180150345A1

    公开(公告)日:2018-05-31

    申请号:US15362522

    申请日:2016-11-28

    Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.

    Serializing machine check exceptions for predictive failure analysis

    公开(公告)号:US11163623B2

    公开(公告)日:2021-11-02

    申请号:US16866485

    申请日:2020-05-04

    Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.

    DISAMBIGUATION OF ERROR LOGGING DURING SYSTEM RESET

    公开(公告)号:US20180341537A1

    公开(公告)日:2018-11-29

    申请号:US15606799

    申请日:2017-05-26

    CPC classification number: G06F11/0766 G06F11/0787

    Abstract: A mechanism for disambiguation of error logging during a warm reset is disclosed. A system agent detects an error occurring during bootstrapping of a processor package. The error occurs prior to initiation of a machine check system. A wide pulse event is initiated to signal a wide pulse register to store a wide pulse time stamp counter value. The wide pulse event also signals a lap register to store a lap time stamp counter value. The wide pulse register maintains the wide pulse time stamp counter value during a warm reset, and the lap register clears the lap time stamp counter value during the warm reset. The system agent obtains the wide pulse time stamp counter value and the lap time stamp counter value after bootstrapping is complete to determine an order of occurrence of the error relative to the warm reset.

    Delayed error processing
    17.
    发明授权

    公开(公告)号:US10929232B2

    公开(公告)日:2021-02-23

    申请号:US15610067

    申请日:2017-05-31

    Abstract: A computing apparatus, including: a hardware platform including a processor and memory; and a system management interrupt (SMI) handler; first logic configured to provide a first container and a second container via the hardware platform; and second logic configured to: detect an uncorrectable error in the first container; responsive to the detecting, generate a degraded system state; provide a degraded state message to the SMI handler; instruct the second container to seek a recoverable state; determine that the second container has entered a recoverable state; and initiate a recovery operation.

    Apparatus and method for vectored machine check bank reporting

    公开(公告)号:US10824496B2

    公开(公告)日:2020-11-03

    申请号:US15857376

    申请日:2017-12-28

    Abstract: An apparatus and method for machine check bank reporting in a processor. For example, one embodiment includes a processor comprising: one or more cores to execute instructions and process data; a plurality of machine check architecture banks to store errors detected during execution of the instructions; error monitoring circuitry to detect the errors and responsively update the MCA banks; and a first error register (FERR) into which a first error vector is to be stored to identify an MCA bank containing a first error in an error sequence, the error monitoring circuitry to update the first error vector responsive to detecting the first error; and one or more next error registers (NERRs) to store one or more error vectors to one or more other MCA banks containing subsequent errors occurring after the first error.

    Serializing machine check exceptions for predictive failure analysis

    公开(公告)号:US10671465B2

    公开(公告)日:2020-06-02

    申请号:US15362522

    申请日:2016-11-28

    Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.

Patent Agency Ranking