摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
A computer and method for problem detection and determination for automated system management in a system, wherein the method comprises monitoring system state, workload, and performance parameters of the system; comparing the monitored parameters against normal system performance behavior of the system, wherein the normal system performance behavior is maintained as a mapping of a system state and workload-to-performance parameters; summarizing performance abnormalities at a specified layer in the system as computation and data-processing attributes, wherein the performance abnormalities comprise deviations from the normal system performance behavior; correlating the performance abnormalities across multiple layers in the system using an attribute-based framework; and communicating a root-cause of the performance abnormalities.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Disclosed is an autonomic abnormality detection device having a plurality of agents, a server with a one or more processors, a data storage device and a corrective actions engine. The device is adapted to detect and diagnose abnormalities in system components. Particularly, the device uses agents to track performance/workload measurements of system components and dynamically compiles a history of those performance/workload measurements for each component. In order to detect abnormalities a processor compares current performance/workload measurements for a component to the compiled histories for that component and for other components. The processor can further be adapted to determine possible causes of a detected abnormality and to report the abnormality, including the possible causes, to a corrective actions engine.
摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Disclosed are system and method embodiments for determining the root-causes of a performance objective violation, such as an end-to-end service level objection (SLO) violation, in a large-scale system with multi-tiered applications. This determination is made using a hybrid of component-level snapshots of the state of the system during a period in which an abnormal event occurred (i.e., black box mapping) and of known events and their causes (i.e., white-box mapping). Specifically, in response to a query about a violation (e.g., why did the response time for application a1 increase from r1 to r2), a processor will access and correlate the black-box and white-box mappings to determine a short-list of probable causes for the violation.
摘要:
Embodiments herein present a method, system, computer program product, etc. for automated management using a hybrid of prediction models and feedback-based systems. The method begins by calculating confidence values of models. Next, the method selects a first model based on the confidence values and processes the first model through a constraint solver to produce first workload throttling values. Following this, workloads are repeatedly processed through a feedback-based execution engine, wherein the feedback-based execution engine is controlled by the first workload throttling values. The first workload throttling values are applied incrementally to the feedback-based execution engine, during repetitions of the processing of the workloads, with a step-size that is proportional to the confidence values. The processing of the workloads is repeated until an objective function is maximized, wherein the objective function specifies performance goals of the workloads.
摘要:
A system and method of conducting resource flow control sessions in a computer network comprises sending a resource request from a client computer to a server computer; assigning to the client computer a flow control window, wherein a size of a flow control window is based on resources available to the server computer and a level of activity of a corresponding client computer, wherein the server computer is in any of a busy and idle state of activity; determining whether to change the size of the flow control window upon receiving the resource request based on the level of activity of the corresponding client computer and a current utilization of resources during a particular session of use; tracking a number of active sessions of use of the resources in a predetermined time window; and maintaining the flow control window with a maximum queue size per number of sessions value.
摘要:
Embodiments of the present invention provide an approach for adapting an information extraction middleware for a clustered computing environment (e.g., a cloud environment) by creating and managing a set of statistical models generated from performance statistics of operating devices within the clustered computing environment. This approach takes into account the required accuracy in modeling, including computation cost of modeling, to pick the best modeling solution at a given point in time. When higher accuracy is desired (e.g., nearing workload saturation), the approach adapts to use an appropriate modeling algorithm. Adapting statistical models to the data characteristics ensures optimal accuracy with minimal computation time and resources for modeling. This approach provides intelligent selective refinement of models using accuracy-based and operating probability-based triggers to optimize the clustered computing environment, i.e., maximize accuracy and minimize computation time.
摘要:
The present invention proactively identifies hotspots in a cloud computing environment through cloud resource usage models that use workload parameters as inputs. In some embodiments the cloud resource usage models are based upon performance data from cloud resources and time series based workload trend models. Hotspots may occur and can be detected at any layer of the cloud computing environment, including the server, storage, and network level. In a typical embodiment, parameters for a workload are identified in the cloud computing environment and inputted into a cloud resource usage model. The model is run with the inputted workload parameters to identify potential hotspots, and resources are then provisioned for the workload so as to avoid these hotspots.