摘要:
Described herein are improvements for generating courses of action for an information technology (IT) environment. In one example, a method includes identifying a first course of action for responding to an incident type in an information technology environment and generating a simulated incident associated with the incident type. The method further includes initiating performance of the first course of action based on the generation of the simulated incident. The method also includes, upon reaching a particular step of the first course of action that prevents the performance of the first course of action from proceeding, providing a first simulated result that allows the performance of the first course of action to proceed.
摘要:
Various examples of the present invention provide a server for providing information of an electronic device, and the server can comprise: a communication unit for receiving, from at least one first electronic device, at least one piece of information of the first electronic device; and a control unit for determining, from the received information, a current state among a plurality of states preset for the first electronic device, and controlling the first electronic device such that state prediction information of the first electronic device is transmitted to a second electronic device if the determined current state satisfies a preset notification condition on a state diagram in which a relationship among the plurality of states is set. Additionally, other examples could be possible besides the various examples of the present invention.
摘要:
A device may obtain first information related to network devices of a network. The device may obtain second information related to the network devices and/or to one or more historic network service incidents. The one or more historic network service incidents may be related to network services provided in association with the network devices. The one or more historic network service incidents may include outages and/or degradations of one or more network services. The device may perform an analysis of the first information and the second information. The device may train a predictive model based on the analysis of the first information and the second information. The predictive model may predict a probability of a future network service incident based on the first information and/or the second information. The device may cause third information, related to the network devices, to be monitored based on the predictive model.
摘要:
Systems and methods are described herein for logging system events within an electronic machine using an event log structured as a collection of tree-like cause and effect graphs. An event to be logged may be received. A new event node may be created within the event log for the received event. One or more existing event nodes within the event log may be identified as having possibly caused the received event. One or more causal links may be created within the event log between the new event node and the one or more identified existing event nodes. The new event node may be stored as an unattached root node in response to not identifying an existing event node that may have caused the received event.
摘要:
Methods for monitoring a networked computing environment and for consolidating multiple alarms under a single root cause are described. In some embodiments, in response to detecting an alert corresponding with a performance issue in a networked computing environment, a root cause identification tool may aggregate a plurality of alarms from a plurality of performance management tools monitoring the networked computing environment. The root cause identification tool may then generate a failure graph associated with the performance issue based on the plurality of alarms, determine a first set of leaf nodes of the failure graph, determine a first chain of failures based on the first set of leaf nodes, suppress (or hide) alarms that are not associated with the first chain of failures, and output a consolidated alarm associated with the first chain of failures.
摘要:
Techniques are described for managing network services deployed in a network using a rules engine with on-demand dependency insertion. A network service manager may use a rules engine to monitor a network service at network devices in order to detect a device-level event, and determine a service-level impact of the detected event based on network service rules and dependencies. The dependencies define links between the device-level event and actions triggered by the device-level event. According to the techniques, a rules engine is configured to detect a device-level event and, in response, insert only those dependencies associated with the detected device-level event into a working memory. Once the device-level event has been cleared, the dependencies related to the device-level event are removed from the working memory. The working memory, therefore, will include only the dependencies needed to determine service-level impacts of currently detected device-level events.
摘要:
Outage detection in a cloud based service is provided using synthetic measurements and anonymized usage data of the cloud based service. Synthetic measurements and usage data are processed through a shared aggregator to generate aggregated data. The synthetic measurements and the usage data are analyzed through a decision tree to correlate an outage based on the synthetic measurements and the usage data. A confidence value is assigned to the outage. An alert is generated that includes information associated with the outage and the confidence value.
摘要:
Systems and methods are described herein for logging system events within an electronic machine using an event log structured as a collection of tree-like cause and effect graphs. An event to be logged may be received. A new event node may be created within the event log for the received event. One or more existing event nodes within the event log may be identified as having possibly caused the received event. One or more causal links may be created within the event log between the new event node and the one or more identified existing event nodes. The new event node may be stored as an unattached root node in response to not identifying an existing event node that may have caused the received event.
摘要:
In one embodiment, a method of determining whether a metric is an anomaly includes receiving a data point and determining a metric in accordance with the data point and a center value. The method also includes determining whether the metric is below a lower threshold, between the lower threshold and an upper threshold, or above the upper threshold and determining that the data point is not the anomaly when the metric is below the lower threshold. Additionally, the method includes determining that the data point is the anomaly when the metric is above the upper threshold and determining that the data point might be the anomaly when the metric is between the lower threshold and the upper threshold.
摘要:
An apparatus monitors a communication system including at least one communication device. The monitoring apparatus includes a memory, a processor. A second virtual system is generated by changing a first virtual system determined according to a combination of an arrangement of a plurality of virtual machines arranged in the at least one communication device, and a communication path between the plurality of virtual machines. The memory stores system information that represents an arrangement and a communication path of virtual machines used in the second virtual system. The processor receives the fault information that reports an occurrence of a fault. The processor identifies the fault information as being generated in the virtual machine within the first virtual system when a specified fault detected in a case where the fault information is transmitted from any of the virtual machines within the second virtual system represented by the system information is not detected.