摘要:
Managing a computer system including automatically adjusting two separate component thresholds (a component threshold pair) based on a statistical model. Specifically, a first component threshold is modeled to predict violations of an SLO based on a violation of the first component threshold and a second, separate component threshold is modeled to predict a non-violation (compliance) of an SLO based on a non-violation of the second component threshold. Over time, the values of the component thresholds may change and one component threshold may be greater than the other component threshold at one time, and vice versa at another time. A component metric reading between the first and second component thresholds indicates that a prediction of an SLO violation or compliance is less certain, and a warning may be issued rather than an alert.
摘要:
Managing a computer system including automatically adjusting two separate component thresholds (a component threshold pair) based on a statistical model. Specifically, a first component threshold is modeled to predict violations of an SLO based on a violation of the first component threshold and a second, separate component threshold is modeled to predict a non-violation (compliance) of an SLO based on a non-violation of the second component threshold. Over time, the values of the component thresholds may change and one component threshold may be greater than the other component threshold at one time, and vice versa at another time. A component metric reading between the first and second component thresholds indicates that a prediction of an SLO violation or compliance is less certain, and a warning may be issued rather than an alert.
摘要:
A method for maximizing a utility of a service contract by optimizing target response time for a performance service level objective is provided. A set of criteria are provided to ensure that performance requirements for the service are met. The method comprises determining one or more usage windows for providing a service, wherein each usage window is associated with a performance requirement and a time period; extracting usage patterns for each usage window based on historical data provided from monitoring requests for service in each usage window; extracting response time per transaction associated with said requests based on historical data provided from monitoring responses provided to said requests in each usage window; and calculating optimal probability for breach in each usage window (Pi) and determining the associated target response time, based on the usage pattern for each window and the response time per transaction.
摘要:
A method for maximizing a utility of a service contract by optimizing target response time for a performance service level objective is provided. A set of criteria are provided to ensure that performance requirements for the service are met. The method comprises determining one or more usage windows for providing a service, wherein each usage window is associated with a performance requirement and a time period; extracting usage patterns for each usage window based on historical data provided from monitoring requests for service in each usage window; extracting response time per transaction associated with said requests based on historical data provided from monitoring responses provided to said requests in each usage window; and calculating optimal probability for breach in each usage window (Pi) and determining the associated target response time, based on the usage pattern for each window and the response time per transaction.
摘要:
In a typical computer network, at least some of the managed resources are monitored to determine whether those resources are meeting predetermined performance goals or service level objectives. To simplify the process of configuring a network monitor, information about the service level objectives is loaded into the resource itself. When the resource is detected, the service level objective information is extracted from the resource information and made available to a translating engine. The translating engine converts the extracted information to monitoring directions that are used to configure the network monitor. Embodiments in which new resources are detected either buying a registration process or a polling process are described.
摘要:
Under the present invention, the performances of a plurality of similarly configured nodes are monitored and compared. If one of the nodes exhibits a performance that varies from the performances of the other nodes by more than a current tolerance, an operational risk is detected. If detected, an alert can be generated and one or more corrective actions implemented to address the operational risk.
摘要:
A system and method for monitoring the availability of an application in a distributed data processing environment are provided. The performance aspects of application availability are defined in terms of easily observed and computed characteristics of the application as it behaves in a deployed environment with the deployed configuration. The system and method observe the application processes, the structural resources they require, and the consumable resources they require from the running system itself. These observations are then used to derive minimum requirements for the resource requirement aspects of availability as well as derive criteria for normal behavioral conditions. These minimum requirements and normal behavioral conditions are then used to establish monitoring rules or conditions for monitoring the operation of the application to determine if availability of the application is degrading such that a notification needs to be sent to an administrator.
摘要:
A system and method for monitoring the availability of an application in a distributed data processing environment are provided. The performance aspects of application availability are defined in terms of easily observed and computed characteristics of the application as it behaves in a deployed environment with the deployed configuration. The system and method observe the application processes, the structural resources they require, and the consumable resources they require from the running system itself. These observations are then used to derive minimum requirements for the resource requirement aspects of availability as well as derive criteria for normal behavioral conditions. These minimum requirements and normal behavioral conditions are then used to establish monitoring rules or conditions for monitoring the operation of the application to determine if availability of the application is degrading such that a notification needs to be sent to an administrator.
摘要:
A context-sensitive pre-evaluation analysis of a set of rules is performed based on the circumstance or the current state of a rule clause directed to an infrequently changing condition. A group of multiple-clause rules are identified which each have a clause defining an infrequently changing condition for evaluating a state of a resource. The current state of the resource is monitored. If the identified group of multiple-clause rules, which cannot evaluate as TRUE under the context of the current state the resource, the identified group of multiple-clause rules are excluded from consideration by the rules engine. The rules engine will then encounter fewer rules to evaluate for a solution. The identified group of multiple-clause rules is further analyzed in the context of the infrequently changing condition for the current resource state. State metrics that are defined by clauses of the identified multiple-clause rules, those that cannot evaluate as TRUE, are identified. Those metrics are then also excluded from consideration by the rules engine. Thus, the rules engine will encounter fewer rules and/or event states that cannot be evaluated to a solution. The context-sensitive pre-evaluation analysis of the rules is performed out-of-band as the rules engine traverses the rule.
摘要:
A context-sensitive pre-evaluation analysis of a set of rules is performed based on the circumstance or the current state of a rule clause directed to an infrequently changing condition. A group of multiple-clause rules are identified which each have a clause defining an infrequently changing condition for evaluating a state of a resource. The current state of the resource is monitored. If the identified group of multiple-clause rules, which cannot evaluate as TRUE under the context of the current state the resource, the identified group of multiple-clause rules are excluded from consideration by the rules engine. The rules engine will then encounter fewer rules to evaluate for a solution. The identified group of multiple-clause rules is further analyzed in the context of the infrequently changing condition for the current resource state. State metrics that are defined by clauses of the identified multiple-clause rules, those that cannot evaluate as TRUE, are identified. Those metrics are then also excluded from consideration by the rules engine. Thus, the rules engine will encounter fewer rules and/or event states that cannot be evaluated to a solution. The context-sensitive pre-evaluation analysis of the rules is performed out-of-band as the rules engine traverses the rule.