摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
A computer and method for problem detection and determination for automated system management in a system, wherein the method comprises monitoring system state, workload, and performance parameters of the system; comparing the monitored parameters against normal system performance behavior of the system, wherein the normal system performance behavior is maintained as a mapping of a system state and workload-to-performance parameters; summarizing performance abnormalities at a specified layer in the system as computation and data-processing attributes, wherein the performance abnormalities comprise deviations from the normal system performance behavior; correlating the performance abnormalities across multiple layers in the system using an attribute-based framework; and communicating a root-cause of the performance abnormalities.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Embodiments herein present a method, system, computer program product, etc. for automated management using a hybrid of prediction models and feedback-based systems. The method begins by calculating confidence values of models. Next, the method selects a first model based on the confidence values and processes the first model through a constraint solver to produce first workload throttling values. Following this, workloads are repeatedly processed through a feedback-based execution engine, wherein the feedback-based execution engine is controlled by the first workload throttling values. The first workload throttling values are applied incrementally to the feedback-based execution engine, during repetitions of the processing of the workloads, with a step-size that is proportional to the confidence values. The processing of the workloads is repeated until an objective function is maximized, wherein the objective function specifies performance goals of the workloads.
摘要:
The embodiments of the invention provide a method, computer program product, etc. for risk-modulated proactive data migration for maximizing utility. More specifically, a method of planning data migration for maximizing utility of a storage infrastructure that is running and actively serving at least one application includes selecting a plurality of potential data items for migration and selecting a plurality of potential migration destinations to which the potential data items can be moved. Moreover, the method selects a plurality of potential migration speeds at which the potential data items can be moved and selects a plurality of potential migration times at which the potential data items can be moved to the potential data migration destinations. The selecting of the plurality of potential migration speeds selects a migration speed below a threshold speed, wherein the threshold speed defines a maximum system utility loss permitted.
摘要:
The embodiments of the invention provide methods, computer program products, etc. for autonomic retention classes when retaining data within storage devices. More specifically, a method of determining whether to retain data within at least one storage device begins by storing data items in at least one storage device. Furthermore, the method maintains access statistics for each of the data items, an age of each of the data items, and an administrator-defined importance value of each of the data items. Following this, a retention value is calculated for each of the data items based on the access statistics for each of the data items, the age of each of the data items, and the administrator-defined importance value of each of the data items.
摘要:
The embodiments of the invention provide methods, computer program products, etc. for complaint-based service level objectives. More specifically, a method of deducing undefined service level objectives receives complaints regarding behavior of a system. The complaints could include a severity parameter, an entity parameter, a nature-of-complaint parameter, a timestamp parameter, and/or an identification parameter. Next, system details representing a current state of the system are recorded for each of the complaints. The method then automatically analyzes a history of the system details and the complaints to produce a historical compilation of the system details. The analyzing can include weighing each of the system details by a severity parameter value.