摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
A computer and method for problem detection and determination for automated system management in a system, wherein the method comprises monitoring system state, workload, and performance parameters of the system; comparing the monitored parameters against normal system performance behavior of the system, wherein the normal system performance behavior is maintained as a mapping of a system state and workload-to-performance parameters; summarizing performance abnormalities at a specified layer in the system as computation and data-processing attributes, wherein the performance abnormalities comprise deviations from the normal system performance behavior; correlating the performance abnormalities across multiple layers in the system using an attribute-based framework; and communicating a root-cause of the performance abnormalities.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Embodiments herein present a method, system, computer program product, etc. for automated management using a hybrid of prediction models and feedback-based systems. The method begins by calculating confidence values of models. Next, the method selects a first model based on the confidence values and processes the first model through a constraint solver to produce first workload throttling values. Following this, workloads are repeatedly processed through a feedback-based execution engine, wherein the feedback-based execution engine is controlled by the first workload throttling values. The first workload throttling values are applied incrementally to the feedback-based execution engine, during repetitions of the processing of the workloads, with a step-size that is proportional to the confidence values. The processing of the workloads is repeated until an objective function is maximized, wherein the objective function specifies performance goals of the workloads.