Fault processing method, related apparatus, and computer

    公开(公告)号:US11360842B2

    公开(公告)日:2022-06-14

    申请号:US17187111

    申请日:2021-02-26

    发明人: Gang Song

    IPC分类号: G06F11/00 G06F11/07

    摘要: In a fault processing method, when it is determined that a computer crashes, a baseboard management controller in the computer can send a read request message to a processor in the computer, where the read request message is used for requesting reading of first error data recorded by the processor, receive a read response message returned by the processor, and obtain, according to the read response message, the first error data recorded by the processor.

    FAULT PROCESSING METHOD, RELATED APPARATUS, AND COMPUTER

    公开(公告)号:US20190332453A1

    公开(公告)日:2019-10-31

    申请号:US16509218

    申请日:2019-07-11

    发明人: Gang Song

    IPC分类号: G06F11/07

    摘要: A fault processing method, a related apparatus, and a computer. When it is determined that a computer crashes, a baseboard management controller in the computer can send a read request message to a processor in the computer, where the read request message is used for requesting reading of error data recorded by the processor, receive a read response message returned by the processor, and obtain, according to the read response message, the error data recorded by the processor. By means of the embodiments of the present invention, an operating system does not need to be used, acquisition of error data in a computer after the computer crashes is implemented using a baseboard management controller, and a problem in the prior art that error data in a computer cannot be acquired after a severe uncorrectable error occurring in the computer causes a system crash is resolved.

    Fault processing method, related apparatus, and computer

    公开(公告)号:US10353763B2

    公开(公告)日:2019-07-16

    申请号:US15385701

    申请日:2016-12-20

    发明人: Gang Song

    IPC分类号: G06F11/00 G06F11/07

    摘要: A fault processing method, a related apparatus, and a computer. When it is determined that a computer crashes, a baseboard management controller in the computer can send a read request message to a processor in the computer, where the read request message is used for requesting reading of first error data recorded by the processor, receive a read response message returned by the processor, and obtain, according to the read response message, the first error data recorded by the processor. By means of the embodiments of the present invention, an operating system does not need to be used, acquisition of error data in a computer after the computer crashes is implemented using a baseboard management controller, and a problem in the prior art that error data in a computer cannot be acquired after a severe uncorrectable error occurring in the computer causes a system crash is resolved.

    Memory fault detection
    4.
    发明授权

    公开(公告)号:US11119874B2

    公开(公告)日:2021-09-14

    申请号:US16748274

    申请日:2020-01-21

    IPC分类号: G06F11/22

    摘要: A memory fault detection method includes: receiving a first interrupt signal sent when a count value of a first leaky bucket counter of a server reaches a first threshold; disabling an interrupt switch of the first leaky bucket counter; enabling the interrupt switch of the first leaky bucket counter after the interrupt switch of the first leaky bucket counter has been disabled for a preset time and the count value of the first leaky bucket counter is reset to zero; receiving a second interrupt signal sent when a count value of a second leaky bucket counter reaches a second threshold; if the second leaky bucket counter and the first leaky bucket counter are a same leaky bucket counter, and the second rank and a first rank are a same rank, determining that a hardware fault occurs in the first rank.

    Memory Fault Detection
    5.
    发明申请

    公开(公告)号:US20200159635A1

    公开(公告)日:2020-05-21

    申请号:US16748274

    申请日:2020-01-21

    IPC分类号: G06F11/22

    摘要: A memory fault detection method includes: receiving a first interrupt signal sent when a count value of a first leaky bucket counter of a server reaches a first threshold; disabling an interrupt switch of the first leaky bucket counter; enabling the interrupt switch of the first leaky bucket counter after the interrupt switch of the first leaky bucket counter has been disabled for a preset time and the count value of the first leaky bucket counter is reset to zero; receiving a second interrupt signal sent when a count value of a second leaky bucket counter reaches a second threshold; if the second leaky bucket counter and the first leaky bucket counter are a same leaky bucket counter, and the second rank and a first rank are a same rank, determining that a hardware fault occurs in the first rank.

    FAULT PROCESSING METHOD, RELATED APPARATUS, AND COMPUTER

    公开(公告)号:US20170102985A1

    公开(公告)日:2017-04-13

    申请号:US15385701

    申请日:2016-12-20

    发明人: Gang Song

    IPC分类号: G06F11/07

    摘要: A fault processing method, a related apparatus, and a computer. When it is determined that a computer crashes, a baseboard management controller in the computer can send a read request message to a processor in the computer, where the read request message is used for requesting reading of first error data recorded by the processor, receive a read response message returned by the processor, and obtain, according to the read response message, the first error data recorded by the processor. By means of the embodiments of the present invention, an operating system does not need to be used, acquisition of error data in a computer after the computer crashes is implemented using a baseboard management controller, and a problem in the prior art that error data in a computer cannot be acquired after a severe uncorrectable error occurring in the computer causes a system crash is resolved.

    Fault Processing Method, Related Apparatus, and Computer

    公开(公告)号:US20210182136A1

    公开(公告)日:2021-06-17

    申请号:US17187111

    申请日:2021-02-26

    发明人: Gang Song

    IPC分类号: G06F11/07

    摘要: In a fault processing method, when it is determined that a computer crashes, a baseboard management controller in the computer can send a read request message to a processor in the computer, where the read request message is used for requesting reading of first error data recorded by the processor, receive a read response message returned by the processor, and obtain, according to the read response message, the first error data recorded by the processor.

    Troubleshooting method, computer system, baseboard management controller, and system

    公开(公告)号:US10430260B2

    公开(公告)日:2019-10-01

    申请号:US15709824

    申请日:2017-09-20

    发明人: Gang Song

    摘要: A troubleshooting method implemented by a processor device is provided, comprising determining, according to collected information of correctable errors, that a correctable error storm has occurred, disabling a system management interrupt (SMI) of generation modules of correctable errors in a correctable error set, wherein the correctable error set comprises correctable errors related to the correctable error storm, sending SMI-disabled notification information to a baseboard management controller (BMC), receiving enable-SMI notification information that is sent by the BMC after a predetermined time elapses after the SMI-disabled notification information has been received, and enabling the disabled SMI of the generation modules of the correctable errors according to the enable-SMI notification information.