摘要:
One embodiment of the present invention provides a system that estimates residual life of a software system under a software-based failure mechanism. During operation, the system first constructs a prognostic database for the software-based failure mechanism based on a plurality of software systems of the same type as the software system, wherein the prognostic database includes a set of prognostic readings associated with the software-based failure mechanism from the plurality of software systems. Note that a given prognostic reading in the prognostic database comprises: (1) a symptom index, which is a function of one or more variables associated with the software-based failure mechanism; and (2) a residual life, which is the remaining time to a failure under the software-based failure mechanism. Next, the system obtains a symptom index value from the software system which is being monitored. The system then estimates a residual life for the software system under the software-based failure mechanism by comparing the symptom index value with the prognostic readings in the prognostic database.
摘要:
A system that select tests to exercise a given computer system is described. During operation, the system tests the given computer system using a set of tests, where a given test includes a given load and a given cycling time selected from a range of cycling times. Moreover, for the given test, the system monitors a stress metric in the given computer system. Additionally, the system selects at least one of the tests from the set of tests to exercise the given computer system based on the monitored stress metric.
摘要:
A system that monitors telemetry from a host computer system to detect degradation in a remote storage device. During operation, the system monitors performance parameters from a host computer system which accesses the remote storage device, wherein the performance parameters relate to the interactions between the host computer system and the remote storage device. The system then determines whether the monitored performance parameters have deviated from predicted values for the performance parameters. If so, the system generates a signal indicating that the remote storage device has degraded.
摘要:
A system for generating a power consumption model of at least one server includes one or more computers configured to obtain n time series telemetry signals indicative of operating parameters of the at least one server, obtain a time series power signal indicative of power consumed by the at least one server, and correlate each of the n time series telemetry signals with the time series power signal. The one or more computers are further configured to select a set of the n time series telemetry signals having an overall correlation with the time series power signal greater than a predetermined threshold, and generate a power consumption model of the at least one server based on at least the set of the n time series telemetry signals.
摘要:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data as a set of telemetric signals using a set of sensors in the computer system. Next, the system analyzes the telemetry data to estimate a value of a parameter associated with the computer system, wherein the parameter is at least one of a power utilization and a temperature. Finally, the system controls a subsequent value of the parameter by modulating a virtual duty cycle of a processor in the computer system based on the estimated value.
摘要:
Some embodiments of the present invention provide a system that controls a cooling fan for a storage array. During operation, an input-output (I/O) metric of the storage array is monitored. Then, the cooling fan is controlled based on the I/O metric.
摘要:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data as a set of telemetric signals using a set of sensors in the computer system. Next, the system uses a regularization technique to calculate a temperature derivative with respect to time for a component in the computer system from the telemetric signals. Finally, the system controls a subsequent value of the temperature derivative with respect to time by modulating a fan speed in the computer system based on the calculated temperature derivative with respect to time and the telemetric signals.
摘要:
Some embodiments of the present invention provide a system that accurately synchronizes signals related to the operation of a computer system. During operation, the system receives a first time-domain signal associated with a first system variable and a second time-domain signal associated with a second system variable from the computer system. The system then transforms the first and the second time-domain signals into a first frequency-domain signal and a second frequency-domain signal, respectively. Next, the system computes a cross-power-spectral-density (CPSD) between the first and second frequency-domain signals to obtain a phase angle versus frequency graph between the two frequency-domain signals. The system subsequently extracts the slope of the phase angle versus frequency graph, and uses the value of the slope to synchronize the first time-domain signal and the second time-domain signal.
摘要:
Some embodiments of the present invention provide a system that synchronizes signals related to the operation of a computer system. During operation, a set of correlation coefficients between a first signal and a second signal is generated, wherein each correlation coefficient is associated with a different phase shift between the first signal and the second signal. Then, a synchronizing phase shift associated with the highest correlation coefficient in the set of correlation coefficients is determined in order to synchronize the first signal and the second signal.
摘要:
Some embodiments of the present invention provide a system that generates a load for a computer system in accordance with a predetermined load profile. During operation, the load for the computer system is generated by modulating the load using pulse-width modulation, wherein the load is periodically cycled between at least two different test load levels so that a moving window average of the modulated load follows the predetermined load profile.