摘要:
One embodiment of the present invention provides a system that proactively monitors a disk drive to detect an impending disk drive failure. During operation, the system obtains a time-varying electrical signal which is generated by the spindle rotation during operation of the disk drive. Next, the system extracts one or more inferential parameters associated with the spindle rotation from the time-varying electrical signal using a phase-sensitive detection. The system then performs proactive fault detection on the inferential parameters to detect an impending failure of the disk drive.
摘要:
A system that facilitates reducing uncertainty in a quantized signal. During operation, the system measures a quantized output signal from a sensor. Next, the system obtains an initial value for an uncertainty interval for the quantized output signal. The system then margins the quantized output signal high by introducing a controlled increase in the mean of the quantized output signal to produce a high-margined quantized output signal. Next, the system measures the high-margined quantized output signal from the sensor. The system then uses information obtained from the high-margined quantized output signal to reduce the uncertainty interval for the quantized output signal.
摘要:
A system that generates a dynamic trace of power consumption in a computer system. The system periodically polls current sensors and associated voltage sensors within the computer system to generate dynamic traces of currents and associated voltages for individual components within the computer system. The system then generates a dynamic trace of total power consumption for the computer system based on the dynamic traces of the currents and the associated voltages for the constituent components.
摘要:
One embodiment of the present invention provides a technique for detecting anomalies during operation of a test computer system. Initially, a golden system and the test system are equipped with the same hardware configuration, wherein the golden system has gone through extensive qualification testing and is presumed to be operating correctly. Next, a deterministic load is executed on the golden system, and values for performance parameters from the golden system are monitored while the deterministic load is executing. Similarly, the deterministic load is also executed on the test system, and values for performance parameters from the test system are monitored while the deterministic load is executing. Next, pairwise differences are computed between values for performance parameters received from the test system and values for performance parameters received from the golden system. Finally, change detection techniques are applied to the pairwise differences to detect anomalies during operation of the test system.
摘要:
One embodiment provides a system that analyzes telemetry data from a monitored system. During operation, the system periodically obtains the telemetry data as a set of telemetry variables from the monitored system and updates a multidimensional real-time distribution of the telemetry data using the obtained telemetry variables. Next, the system analyzes a statistical deviation of the multidimensional real-time distribution from a multidimensional reference distribution for the monitored system using a multivariate sequential probability ratio test (SPRT) and assesses the integrity of the monitored system based on the statistical deviation of the multidimensional real-time distribution. If the assessed integrity falls below a threshold, the system determines a fault in the monitored system corresponding to a source of the statistical deviation.
摘要:
Embodiments of a method for determining locations of computers in a group of computers, which may be performed by a system, are described. During operation, the system receives a location of a first computer in the group of computers. Then, the system determines locations of one or more additional computers in the group of computers relative to the first computer based on vibration spectra associated with the first computer and the one or more additional computers.
摘要:
Some embodiments of the present invention provide a system that stores telemetry data from a computer system. The system includes a first buffer, a second buffer, and a third buffer. During operation, the system periodically obtains the telemetry data from the computer system and stores the telemetry data in the first buffer, second buffer, and third buffer. The system also compresses the telemetry data in the first and second buffers. To compress the data, the system creates a first set of summary statistics from the telemetry data in the first buffer and the second buffer and stores the first set of summary statistics in the first buffer, which becomes a historical data buffer.
摘要:
Embodiments of the present invention provide a system that estimates the value of a virtual sensor. The system first samples values for performance metrics using external sensors that are coupled to a system and internal sensors that are built into the system. Next, the system generates an inferential sensing model for the system from the sampled values. Then, during operation, the system samples values of performance metrics using the internal sensors and uses the inferential sensing model and the sampled values from the internal sensors to estimate the values of performance metrics for removed external sensors (i.e., virtual sensors).
摘要:
A computer system that schedules loads across a set of processor cores is described. During operation, the computer system receives thermal measurements from sensors associated with the set of processor cores, and removes noise from the thermal measurements. Then, the computer system analyzes thermal properties of the set of processor cores based on the thermal measurements. Next, the computer system receives a process to be executed, and schedules the process to be executed by at least one of the processor cores based on the analysis. This scheduling is performed in a manner that reduces spatial and temporal thermal variations in the integrated circuit.
摘要:
Embodiments of the present invention provide a system for detecting vibrations from a component. The system operates by coupling vibrations from a component to a membrane using a flexible connecting line. The membrane converts the vibrations into acoustic waves that are transmitted through a medium. Finally, a microphone detects the acoustic waves in the medium and converts the acoustic waves into electrical signals.