摘要:
A system that detects multiple anomalies in a cluster of components is presented. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. If so, the system performs one or more remedial actions.
摘要:
One embodiment provides a system that analyzes an electrical connection in a computer system. During operation, the system monitors a reflection coefficient associated with the electrical connection and applies a sequential-analysis technique to the reflection coefficient to determine a statistical deviation of the reflection coefficient. Next, the system assesses the integrity of the electrical connection based on the statistical deviation of the reflection coefficient. Finally, the system uses the assessed integrity to maintain the electrical connection.
摘要:
One embodiment of the present invention provides a system that determines the reliability of a component in a system. During operation, the system monitors inferential variables associated with a number of specimens of the component. The system then collects degradation data by first computing a likelihood value that indicates whether an inferential variable associated with a specimen of the component is behaving normally or abnormally. Next, the system determines whether the specimen of the component has degraded based on the likelihood value. If the specimen of the component is determined to have degraded, the system records the time when the specimen of the component was determined to have degraded. The system also uses the degradation data to determine the reliability of the component in the system.
摘要:
Some embodiments of the present invention provide a system for in-situ characterization of a solid-state light. First, a voltage and a current of the solid-state light source are monitored. Then, the health of the solid-state light source is characterized based on an analysis of the monitored voltage and current.
摘要:
One embodiment of the present invention provides a system that performs a real-time root-cause-analysis for a degradation event associated with a component under test. During operation, the system monitors a telemetry signal collected from the component, and while doing so, attempts to detect an anomaly in the telemetry signal. If an anomaly is detected in the telemetry signal, the system performs a failure analysis on the telemetry signal in real-time while the telemetry signal is degrading. Next, the system identifies a failure mechanism for the component based on the failure analysis.
摘要:
Some embodiments of the present invention provide a system that determines the reliability of an interconnect. During operation, connectors in the interconnect are categorized into a set of predetermined groups. Next, the reliability for selected groups in the set of predetermined groups is determined. Then, a reliability model for the interconnect is generated based on the selected groups and the reliability of the selected groups to determine the overall reliability of the interconnect.
摘要:
One embodiment of the present invention provides a system that performs a real-time root-cause-analysis for a degradation event associated with a component under test. During operation, the system monitors a telemetry signal collected from the component, and while doing so, attempts to detect an anomaly in the telemetry signal. If an anomaly is detected in the telemetry signal, the system performs a failure analysis on the telemetry signal in real-time while the telemetry signal is degrading. Next, the system identifies a failure mechanism for the component based on the failure analysis.
摘要:
A system that detects the onset of degradation for interconnections in a component within a computer system. During operation, the system monitors inferential variables associated with the interconnections during operation of the computer system. Next, the system determines a present state of the component from the monitored inferential variables. The system then compares the present state of the component with an initial state of the component. If the comparison indicates that the interconnections in the component have reached or will reach a limited operating state (LOS), the system performs a remedial action.
摘要:
One embodiment of the present invention provides a system that tests the quality and/or the reliability of a component. During operation, the system applies test conditions to a plurality of specimens of the component. While applying the test conditions, the system measures the same variable from each of the plurality of specimens. Next, the system computes a running average of the measured variable across the plurality of specimens. The system then computes residuals between the measured variable for each specimen and the running average. The system next determines from the residuals whether the associated specimens are degraded.
摘要:
A system that detects multiple anomalies in a cluster of components is presented. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. If so, the system performs one or more remedial actions.