Self-generated thermal stress evaluation

    公开(公告)号:US10303574B1

    公开(公告)日:2019-05-28

    申请号:US14843937

    申请日:2015-09-02

    Abstract: Self-generated thermal stress evaluation concepts are described. In one embodiment, a system includes a computing device, a cooling system, such as fans, that draws heat away from the computing device, and a management controller. The management controller can sense a temperature in the computing device and compare it against a temperature profile. The temperature profile can specify one or more target temperatures in the computing device over time. Based on the comparison, the management controller can adjust a cooling capacity of the cooling system. The adjustment to the cooling capacity can be achieved by reducing the speed of the fans, for example, to raise the temperature in the computing device. Processing tasks can also be executed in the computing device and, in response to the detection of an error in the computing device, the management controller can record the error and a profile for the error for further evaluation.

    Error generation using a computer add-in card

    公开(公告)号:US10261880B1

    公开(公告)日:2019-04-16

    申请号:US15384026

    申请日:2016-12-19

    Abstract: A smart add-in card can be leveraged to perform testing on a host server computer. The add-in card can include an embedded processor and memory. Tests can be downloaded to the add-in card to test a communication bus between the host server computer (motherboard) and the add-in card. In a particular example, a PCIe communication bus couples the motherboard to the add-in card and the tests can inject errors on the PCIe communication bus. The tests can be developed to test errors that are typically difficult to test without the use of special hardware. However, the smart add-in card can be a simple Network Interface Card (NIC) that resides on the host server computer during normal operation and is used for communication other than error testing. By using the NIC as a testing device, repeatable and reliable testing can be obtained.

    Hardware device error origin identification

    公开(公告)号:US10915389B1

    公开(公告)日:2021-02-09

    申请号:US15701204

    申请日:2017-09-11

    Abstract: Technologies are provided for determining an identity of a hardware device that transmitted an error message via a communication bus. A chipset of the communication bus can be configured to transmit an interrupt to an interrupt handler in response to receipt of the error message. The interrupt handler can be configured to determine an identity of the hardware device based on the contents of the error message. The interrupt handler can be configured to transmit a notification to an error remediation service, wherein the notification is associated with the identity of the hardware device. The remediation service can be configured to use the identity of the hardware device to perform one or more error remediation operations. In at least some embodiments, the interrupt handler is configured to store the identifier in a memory and the error remediation service is configured to retrieve the identifier from the memory.

    Communication link testing
    6.
    发明授权

    公开(公告)号:US10678721B1

    公开(公告)日:2020-06-09

    申请号:US15422793

    申请日:2017-02-02

    Abstract: A smart add-in card can be leveraged to perform testing on a host server computer. The add-in card can include an embedded processor and memory. Tests can be downloaded to the add-in card to test analog features of a communication bus between the host server computer (motherboard) and the add-in card. In a particular example, a PCIe communication bus couples the motherboard to the add-in card and the tests can test a connection or communication link negotiated between the add-in card and another device using the PCIe communication bus. The tests can be developed to test errors that are typically difficult to test without the use of special hardware. However, the smart add-in card can be a simple Network Interface Card (NIC) that resides on the host server computer during normal operation and is used for communication other than error testing.

    Network broadcast traffic filtering

    公开(公告)号:US09807013B1

    公开(公告)日:2017-10-31

    申请号:US14662818

    申请日:2015-03-19

    Abstract: Techniques and solutions for automatically filtering network broadcast traffic are described. For example, network broadcast traffic can be automatically filtered by turning broadcast filtering on and off (e.g., as a continuous strobe pattern that alternates enabling and disabling of broadcast filtering). For example, a computing device (e.g., via a network interface or management controller of the computing device) can automatically enable network broadcast traffic filtering during a first time period (e.g., a four second time period) and disable network broadcast traffic filtering during a second time period (e.g., a one second time period). A computing device can also automatically enable and disable network broadcast traffic filtering according to an on-off pattern (e.g., based on various criteria, such as network queue size, broadcast traffic volume, etc.).

    Obtaining computer crash analysis data

    公开(公告)号:US11188407B1

    公开(公告)日:2021-11-30

    申请号:US16413341

    申请日:2019-05-15

    Abstract: When a computer boots up, a Basic Input/Output System (BIOS) configures system memory to have a crash memory area within the system address map, which can be used by a processor to dump crash memory data. When an error event occurs, the processor can initiate a dump to the crash memory area. Any desired data can be placed into the crash memory area, but typical data can include a state of registers in the processor. The processor then sets a flag, such as an external pin, indicating that the crash memory data is ready to be read. The flag can be read by a secure processor, which then reads the crash memory area at normal memory access speeds using the system bus. For example, the secure processor can access the crash memory area using Direct Memory Access (DMA) reads over a PCIe system bus.

    Predictive failure of hardware components

    公开(公告)号:US10346239B1

    公开(公告)日:2019-07-09

    申请号:US15194180

    申请日:2016-06-27

    Abstract: A system is described wherein power degradation can be used in conjunction with predictive failure analysis in order to accurately determine when a hardware component might fail. In one example, printed circuit boards (PCBs) can unexpectedly malfunction due to a variety of reasons including silicon power variation or air mover speed. Other hardware components can include silicon or an integrated circuit. In order to accurately monitor the hardware component, telemetry is used to automatically receive communications regarding measurements of data associated with the hardware component, such as power-related data or temperature data. The different temperature data can include junction temperature or ambient air temperature to determine an expected power usage. The actual power usage is then compared to the expected power usage to determine whether the hardware component can soon fail.

Patent Agency Ranking