摘要:
One embodiment of the present invention provides a system that estimates the relative humidity inside a computer system. During operation, a set of performance parameters of the computer system and an external relative humidity outside of the computer system are monitored. Then, the relative humidity inside the computer system is estimated based on the set of performance parameters, the external relative humidity, and a relative humidity model, wherein training of the relative humidity model includes measuring an external training relative humidity outside of the computer system and a training relative humidity inside the computer system while monitoring the set of performance parameters of the computer system.
摘要:
One embodiment of the present invention provides a system that estimates the relative humidity inside a computer system. During operation, a set of performance parameters of the computer system and an external relative humidity outside of the computer system are monitored. Then, the relative humidity inside the computer system is estimated based on the set of performance parameters, the external relative humidity, and a relative humidity model, wherein training of the relative humidity model includes measuring an external training relative humidity outside of the computer system and a training relative humidity inside the computer system while monitoring the set of performance parameters of the computer system.
摘要:
A system that detects multiple anomalies in a cluster of components is presented. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. If so, the system performs one or more remedial actions.
摘要:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data as a set of telemetric signals using a set of sensors in the computer system. Next, for each component or component location from a set of components in the computer system, the system applies an inferential model to the telemetry data to determine an operating environment of the component or component location, and uses the operating environment to assess a reliability of the component. Finally, the system manages use of the component in the computer system based on the assessed reliability.
摘要:
One embodiment of the present invention provides a system that performs a real-time root-cause-analysis for a degradation event associated with a component under test. During operation, the system monitors a telemetry signal collected from the component, and while doing so, attempts to detect an anomaly in the telemetry signal. If an anomaly is detected in the telemetry signal, the system performs a failure analysis on the telemetry signal in real-time while the telemetry signal is degrading. Next, the system identifies a failure mechanism for the component based on the failure analysis.
摘要:
A system that detects the onset of degradation for interconnections in a component within a computer system. During operation, the system monitors inferential variables associated with the interconnections during operation of the computer system. Next, the system determines a present state of the component from the monitored inferential variables. The system then compares the present state of the component with an initial state of the component. If the comparison indicates that the interconnections in the component have reached or will reach a limited operating state (LOS), the system performs a remedial action.
摘要:
One embodiment of the present invention provides a system that tests the quality and/or the reliability of a component. During operation, the system applies test conditions to a plurality of specimens of the component. While applying the test conditions, the system measures the same variable from each of the plurality of specimens. Next, the system computes a running average of the measured variable across the plurality of specimens. The system then computes residuals between the measured variable for each specimen and the running average. The system next determines from the residuals whether the associated specimens are degraded.
摘要:
Some embodiments of the present invention provide a system for in-situ characterization of a solid-state light. First, a voltage and a current of the solid-state light source are monitored. Then, the health of the solid-state light source is characterized based on an analysis of the monitored voltage and current.
摘要:
One embodiment of the present invention provides a system that performs a real-time root-cause-analysis for a degradation event associated with a component under test. During operation, the system monitors a telemetry signal collected from the component, and while doing so, attempts to detect an anomaly in the telemetry signal. If an anomaly is detected in the telemetry signal, the system performs a failure analysis on the telemetry signal in real-time while the telemetry signal is degrading. Next, the system identifies a failure mechanism for the component based on the failure analysis.
摘要:
Some embodiments of the present invention provide a system that determines the reliability of an interconnect. During operation, connectors in the interconnect are categorized into a set of predetermined groups. Next, the reliability for selected groups in the set of predetermined groups is determined. Then, a reliability model for the interconnect is generated based on the selected groups and the reliability of the selected groups to determine the overall reliability of the interconnect.