摘要:
The disclosed embodiments provide a system that prevents oscillatory load behavior for a multi-node distributed system. During operation, the system uses a load-balancing policy to distribute requests to nodes of the distributed system. The system determines operational characteristics for the nodes as they process a set of requests, and then uses these operational characteristics to compute machine queuing models that describe the machine state of each node. The system then uses this machine state for the nodes to determine whether the load-balancing policy and the distributed system are susceptible to oscillatory load behavior.
摘要:
The disclosed embodiments provide a system that prevents oscillatory load behavior for a multi-node distributed system. During operation, the system uses a load-balancing policy to distribute requests to nodes of the distributed system. The system determines operational characteristics for the nodes as they process a set of requests, and then uses these operational characteristics to compute machine queuing models that describe the machine state of each node. The system then uses this machine state for the nodes to determine whether the load-balancing policy and the distributed system are susceptible to oscillatory load behavior.
摘要:
Systems, methods, and other embodiments associated with transient detection for predictive health management are described. In one embodiment, a method includes receiving a health signal from a data outlet. The health signal corresponds to a derivative variable derived from a combination of data processing system metrics not exposed beyond the data outlet. A transient is detected in the health signal. In response to detecting the transient, the method includes performing a corrective action. The example method may also include detecting the transient in a health signal from a data processing system by collecting historical values of the health signal; selecting a first statistical model that best fits the historical values; receiving a present value of the health signal; and applying the first statistical model to the present value of the health signal to determine whether the present value of the health signal is a transient.
摘要:
Some embodiments of the present invention provide a system that schedules read operations for disk drives in a set of disk drives. During operation, the system monitors a write rate for write operations to a given disk drive in the set of disk drives, wherein vibrations generated by the read operations directed to disk drives in the set of disk drives are transmitted to the given disk drive. Then, the read operations for disk drives in the set of disk drives are scheduled based on the write rate for the given disk drive, thereby limiting interference between the write operations and the vibrations generated by the read operations.
摘要:
One embodiment of the present invention provides a system that generates vibration-resistance signatures for hard disk drives (HDDs). In this system, a set of HDDs is mechanically affixed to a disk enclosure. The system additionally includes a vibration generator which is mechanically coupled to the disk enclosure and can apply a translational vibration profile to the disk enclosure. The system further includes a coupling mechanism between the set of HDDs and the disk enclosure which translates the translational vibration profile into both translational and rotational vibrations for the set of HDDs in multiple dimensions. The system additionally includes a monitoring mechanism which monitors an HDD performance metric from the set of HDDs while the HDDs are subject to the translational and rotational vibrations. The system also includes a signature-generation mechanism which uses the monitored HDD performance metric to generate vibration-resistance signatures for the set of HDDs.
摘要:
Some embodiments of the present invention provide a system that analyzes data from a computer system. During operation, the system obtains the sensor data from a component in the computer system using a set of sensors. Next, the system transmits the sensor data to a microcontroller unit (MCU) coupled to the sensors and stores the sensor data in internal memory of the MCU. Finally, the system assesses the integrity of the component by analyzing the sensor data using a pattern-recognition apparatus in the MCU.
摘要:
A method for generating a service action for a computer system is described. During the method, a longevity index value for a packaging technology (such as solder joints in a BGA) in the computer system is calculated using thermal and vibration telemetry data (which is collected in the computer system) and a longevity model. This longevity model may be based on accelerated failure testing of the packaging technology, field failures of the packaging technology in a group of computer systems (which includes the computer system) and/or thermal and vibration telemetry data for the group of computer systems. Furthermore, using the longevity index value, the service action for the computer system is determined. Based on the longevity index value, remedial action (such as repairs to the computer system) may be scheduled and performed.
摘要:
Some embodiments of the present invention provide a system that schedules read operations for disk drives in a set of disk drives. During operation, the system monitors a write rate for write operations to a given disk drive in the set of disk drives, wherein vibrations generated by the read operations directed to disk drives in the set of disk drives are transmitted to the given disk drive. Then, the read operations for disk drives in the set of disk drives are scheduled based on the write rate for the given disk drive, thereby limiting interference between the write operations and the vibrations generated by the read operations.
摘要:
One embodiment of the present invention provides a system that performs parallel grouping decomposition to facilitate expedited training of a support vector machine (SVM). During operation, the system receives a training dataset comprised of data vectors. The system then determines whether any data vector in the dataset violates conditions associated with a current SVM. Next, the system divides the violating data vectors into a number of subsets, thereby allowing parallel SVM training for each subset. The system subsequently builds an independent SVM for each subset in parallel based on the current SVM. The system then constructs a new SVM to replace the current SVM based on the SVMs built for each subset of violating data vectors.
摘要:
Some embodiments of the present invention provide a system that schedules read operations for disk drives in a set of disk drives. During operation, the system monitors a write rate for write operations to a given disk drive in the set of disk drives, wherein vibrations generated by the read operations directed to disk drives in the set of disk drives are transmitted to the given disk drive. Then, the read operations for disk drives in the set of disk drives are scheduled based on the write rate for the given disk drive, thereby limiting interference between the write operations and the vibrations generated by the read operations.