摘要:
Power consumption of computing devices are monitored with performance counters and used to generate a power model for each computing device. The power models are used to estimate the power consumption of each computing device based on the performance counters. Each computing device is assigned a power cap, and a software-based power control at each computing device monitors the performance counters, estimates the power consumption using the performance counters and the model, and compares the estimated power consumption with the power cap. Depending on whether the estimated power consumption violates the power cap, the power control may transition the computing device to a lower power state to prevent a violation of the power cap or a higher power state if the computing device is below the power cap.
摘要:
Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
摘要:
The present invention extends to methods, systems, and computer program products for automatically generating and refining health models. Embodiments of the invention use machine learning tools to analyze historical telemetry data from a server deployment. The tools output fingerprints, for example, small groupings of specific metrics-plus-behavioral parameters, that uniquely identify and describe past problem events mined from the historical data. Embodiments automatically translate the fingerprints into health models that can be directly applied to monitoring the running system. Fully-automated feedback loops for identifying past problems and giving advance notice as those problems emerge in the future is facilitated without any operator intervention. In some embodiments, a single portion of expert knowledge, for example, Key Performance Indicator (KPI) data, initiates health model generation. Once initiated, the feedback loop can be fully automated to access further telemetry and refine health models based on the further telemetry.
摘要:
Methods for automatically identifying and classifying a crisis state occurring in a system having a plurality of computer resources. Signals are received from a device that collects the signals from each computer resource in the system. For each epoch, an epoch fingerprint is generated. Upon detecting a performance crisis within the system, a crisis fingerprint is generated consisting of at least one epoch fingerprint. The technology is able to identify that a performance crisis has previously occurred within the datacenter if a generated crisis fingerprint favorably matches any of the model crisis fingerprints stored in a database. The technology may also predict that a crisis is about to occur.
摘要:
An embodiment of a method of predicting response time for a storage request begins with a first step of a computing entity storing a training data set. The training data set comprises past performance observations for past storage requests of a storage array. Each past performance observation comprises an observed response time and a feature vector for a particular past storage request. The feature vector includes characteristics that are available external to the storage array. In a second step, the computing entity forms a response time forecaster from the training data set. In the third step, the computing entity applies the response time forecaster to a pending feature vector for a pending storage request to obtain a predicted response time for the pending storage request.
摘要:
Systems, methods, and software used in performing automated diagnosis and identification of or forecasting service level object states. Some embodiments include building classifier models based on collected metric data to detect and forecast service level objective (SLO) violations. Some such systems, methods, and software further include automated detecting and forecasting of SLO violations along with providing alarms, messages, or commands to administrators or system components. Some such messages include diagnostic information with regard to a cause of a SLO violation. Some embodiments further include storing data representative of system performance and detected and forecast system SLO states. This data can then be used to generate reports of system performance including representations of system SLO states.
摘要:
A method of determining behavior of an information system application is provided. The information system application's behavior for user content requests and load conditions is determined as is a user's quality of service objectives. The information system application's capacity allocation is then prioritized. Changes in the information system application's behavior are detected. The behavior of the information system applications is then updated in response to detecting changes that affect the user's quality of service objectives.
摘要:
A computer system includes a signature creation engine operable to determine signatures representing states of a computer resource from metrics for the computer resource. The computer system also includes a database operable to store the signatures along with an annotation for each signature including information relating to a state of the computer resource. The computer system is operable to determine a recurrent problem of the computer resource from stored signatures.
摘要:
Power consumption of computing devices are monitored with performance counters and used to generate a power model for each computing device. The power models are used to estimate the power consumption of each computing device based on the performance counters. Each computing device is assigned a power cap, and a software-based power control at each computing device monitors the performance counters, estimates the power consumption using the performance counters and the model, and compares the estimated power consumption with the power cap. Depending on whether the estimated power consumption violates the power cap, the power control may transition the computing device to a lower power state to prevent a violation of the power cap or a higher power state if the computing device is below the power cap.
摘要:
In a distributed storage system such as those in a data center or web based service, user characteristics and characteristics of the hardware such as storage size and storage throughput impact the capacity and performance of the system. In such systems, an allocation is a mapping from the user to the physical storage devices where data/information pertaining to the user will be stored. Policies regarding quality of service and reliability including replication of user data/information may be provided by the entity managing the system. A policy may define an objective function which quantifies the value of a given allocation. Maximizing the value of the allocation will optimize the objective function. This optimization may include the dynamics in terms of changes in patterns of user characteristics and the cost of moving data/information between the physical devices to satisfy a particular allocation.