摘要:
An alarm analysis method, including determining M alarm pairs in a first alarm set, where each alarm pair of the M alarm pairs includes a first alarm and a second alarm having an association, generating, according to an association rule, a first feature set of N alarm pairs, the first alarm of each alarm pair of the N alarm pairs being an alarm pair root in the first feature set, the first feature set including a first probability that a first subsystem to which each first alarm belongs is a subsystem root and a first alarm object is an alarm object root and a second probability that a second subsystem to which each second alarm belongs is a subsystem root and a second alarm object is an alarm object root, and determining root information of the first alarm set based on the first probability and the second probability.
摘要:
A system and method of managing a network with assets are described. The method includes generating a directed graph with each of the assets represented as a node, determining individual failure probability of each node, computing downstream failure probability of each node according to an arrangement of the nodes in the directed graph, computing upstream failure probability of each node according to the arrangement of the nodes in directed graph, and computing network failure probability for each node based on the corresponding individual failure probability, the downstream failure probability, and the upstream failure probability. Managing the network is based on the network failure probability of the assets.
摘要:
The present invention discloses a network burst load evacuation method for edge servers, which takes a time and average penalty function of all tasks performed by the edge system as a minimum optimization goal. This method not only takes into account the fairness of all users in the system, but also ensures that the unloading tasks of all users in the system can be completed in a relatively shortest time, and a new quantitative measure is proposed for improving user QoS response. In the implementation process of the algorithm in the present invention, a particle swarm algorithm is used to solve an optimal target of the system, This algorithm has a fast execution speed and high efficiency, and is especially suitable for a scene of an edge computing network system, so that when a sudden load occurs, an edge computing network system can respond in a very short time and complete the evacuation of the load, which greatly improves the fault tolerance and stability of the edge network environment.
摘要:
Systems and methods are described herein for logging system events within an electronic machine using an event log structured as a collection of tree-like cause and effect graphs. An event to be logged may be received. A new event node may be created within the event log for the received event. One or more existing event nodes within the event log may be identified as having possibly caused the received event. One or more causal links may be created within the event log between the new event node and the one or more identified existing event nodes. The new event node may be stored as an unattached root node in response to not identifying an existing event node that may have caused the received event.
摘要:
An alarm processing method and an alarm processing apparatus are provided. In the alarm processing method, an alarm reported by a Virtualized Network Function (VNF) is received; VNF Forwarding Graph (VNF FG) information of the VNF and/or Network Forwarding Path (NFP) information of the VNF is acquired; and alarm analysis processing is performed on the received alarm according to the VNF FG information and/or the NFP information acquired.
摘要:
The disclosed computer-implemented method for debugging network nodes may include (1) detecting a computing event that is indicative of a networking malfunction within a network node, (2) determining, based at least in part on the computing event, one or more potential causes of the networking malfunction, (3) identifying one or more debugging templates that each define debugging steps that, when performed by a computing system, enable the computing system to determine whether the networking malfunction resulted from any of the potential causes, (4) performing a set of debugging steps defined by one of the debugging templates that corresponds to one of the potential causes, and then (5) determining, based at least in part on the set of debugging steps defined by the debugging template, that the networking malfunction resulted from the potential cause. Various other methods, systems, and apparatuses are also disclosed.
摘要:
Embodiments of the present disclosure relate to a method and apparatus for managing a failure of a device. The method comprises detecting whether a failure occurs in a device, and generating a failure report for the failure in response to the failure occurring in the device. The method further comprises querying a device object repository with the failure report, and the object device repository stores historical failure information associated with the device and a fix solution corresponding to the historical failure information. The method further comprises obtaining the fix solution from the device object repository based on a comparison between the failure report and the historical failure information. Embodiments of the present disclosure can manage the failure of the device more effectively.
摘要:
Systems, methods, architectures and/or apparatus providing a visualization tool wherein an initial or simplified correlation tree includes a path between two hierarchically related objects; namely, a root cause object and an object representing an entity associated with an event of interest caused by the root cause entity, wherein the correlation tree may be incrementally increased in size and complexity in response to user input, such as via a graphical user interface, such that the user attention is focused on the specific entities and their relationships to thereby enable a user to quickly understand the various failure relationships.
摘要:
Handling alerts in a system to reduce to the number of non-actionable alerts that are provided to an alert handling portion of the system. A method includes receiving an alert. The alert is an unstructured data alert. The method further includes comparing the alert to a plurality of known non-actionable alerts to determine a similarity of the alert to one or more of the known non-actionable alerts. The method further includes dispatching the alert based on the similarity of the alert to one or more of the known non-actionable alerts.
摘要:
Systems and methods which provide an adaptive unified performance management (AUPM) framework for interacting with disparate network elements using techniques adaptive to operational conditions to provide network performance adaptive root cause analysis (ARCA) are shown. An AUPM framework of embodiments of the invention implements a proxy based architecture in which a plurality of proxies are utilized to connect to and perform data communication with the disparate network elements. Centralized performance management is in communication with the proxies to obtain and unify network element data for performance monitoring, alarm reporting, and/or root cause analysis. The performance monitoring, alarm reporting, and root cause analysis provided by centralized performance management of embodiments herein implements adaptive cluster-based analysis to provide robust operation adapted to accommodate various operational scenarios, such as may include time varying conditions and learning based configuration.