摘要:
A system that select tests to exercise a given computer system is described. During operation, the system tests the given computer system using a set of tests, where a given test includes a given load and a given cycling time selected from a range of cycling times. Moreover, for the given test, the system monitors a stress metric in the given computer system. Additionally, the system selects at least one of the tests from the set of tests to exercise the given computer system based on the monitored stress metric.
摘要:
A system that identifies processes with a memory leak in a computer system. During operation, the system periodically samples memory usage for processes running on the computer system. The system then ranks the processes by memory usage and selects a specified number of processes with highest memory usage based on the ranking. For each selected process, the system computes a first-order difference of memory usage by taking a difference between the memory usage at a current sampling time and the memory usage at an immediately preceding sampling time. The system then generates a memory-leak index based on the first-order difference and a preceding memory-leak index computed at the immediately preceding sampling time.
摘要:
Some embodiments of the present invention provide a system that controls a cooling fan for a storage array. During operation, an input-output (I/O) metric of the storage array is monitored. Then, the cooling fan is controlled based on the I/O metric.
摘要:
One embodiment of the present invention provides a system that dynamically controls a temperature profile within a disk drive by generating disk drive activity. During operation, the system first receives a desired temperature profile. Next, the system generates a load profile based on the desired temperature profile, wherein the load profile specifies read/write operations on the disk drive. The system then applies the load profile to the disk drive to generate disk drive activity, wherein the disk activity causes the temperature in the disk drive to track the desired temperature profile.
摘要:
Some embodiments of the present invention provide a system that synchronizes signals related to the operation of a computer system. During operation, a set of correlation coefficients between a first signal and a second signal is generated, wherein each correlation coefficient is associated with a different phase shift between the first signal and the second signal. Then, a synchronizing phase shift associated with the highest correlation coefficient in the set of correlation coefficients is determined in order to synchronize the first signal and the second signal.
摘要:
A system that determines whether components are not present in a computer system is presented. During operation the system receives telemetry signals from sensors within the computer system. Next, the system dynamically generates a temperature map for the computer system based on the telemetry signals. The system then analyzes the temperature map to determine whether components are not present in the computer system.
摘要:
Embodiments of the present invention provide a system that dynamically controls a temperature profile within a computer system by generating computer system activity. The system starts by receiving a desired temperature profile. The system then generates a load profile based on the desired temperature profile, wherein the load profile specifies operations to be performed by the computer system. The system next executes the load profile on the computer system to generate computer system activity, wherein the computer system activity causes the desired temperature profile in the computer system.
摘要:
One embodiment of the present invention provides a system that estimates residual life of a software system under a software-based failure mechanism. During operation, the system first constructs a prognostic database for the software-based failure mechanism based on a plurality of software systems of the same type as the software system, wherein the prognostic database includes a set of prognostic readings associated with the software-based failure mechanism from the plurality of software systems. Note that a given prognostic reading in the prognostic database comprises: (1) a symptom index, which is a function of one or more variables associated with the software-based failure mechanism; and (2) a residual life, which is the remaining time to a failure under the software-based failure mechanism. Next, the system obtains a symptom index value from the software system which is being monitored. The system then estimates a residual life for the software system under the software-based failure mechanism by comparing the symptom index value with the prognostic readings in the prognostic database.
摘要:
The disclosed embodiments provide a system that analyzes telemetry data from a computer system. During operation, the system obtains the telemetry data as a set of telemetric signals using a set of sensors in the computer system. Next, the system analyzes the telemetry data to estimate a value of a parameter associated with the computer system, wherein the parameter is at least one of a power utilization and a temperature. Finally, the system controls a subsequent value of the parameter by modulating a virtual duty cycle of a processor in the computer system based on the estimated value.
摘要:
Some embodiments of the present invention provide a system that measures a power efficiency of a computer system. During operation, the system collects telemetry data from a set of sensors within the computer system. Next, the system determines a power consumption of the computer system from the telemetry data and determines a number of input/output operations per second (IOPS) for the computer system from the telemetry data. Finally, the system computes an IOPS per watt metric from the power consumption and the number of IOPS.