-
公开(公告)号:US20220391279A1
公开(公告)日:2022-12-08
申请号:US17342423
申请日:2021-06-08
Applicant: VMware, Inc
Inventor: Naira Movses Grigoryan , Ashot Nshan Harutyunyan , Amak Poghosyan , Nicholas Kushmerick , Janislav Jankov
Abstract: Methods and systems are directed to discovering problem incidents in a distributed computing system. Events corresponding to historical problems incidents for the distributed computing system are retrieved from a data base. Sets of representative events of the various historical problem incidents for the distributed computing system are determined. A runtime problem incident in the distributed computing system is characterized by runtime events. The runtime problem incident is classified as corresponding to a historical problem incident of the historical problem incidents based on the runtime events and the sets of representative events. Remedial measures used to correct the historical problem incident may be used to correct the runtime problem.
-
公开(公告)号:US11815989B2
公开(公告)日:2023-11-14
申请号:US17580143
申请日:2022-01-20
Applicant: VMware, Inc.
Inventor: Ashot Nshan Harutyunyan , Amak Poghosyan , Naira Movses Grigoryan
CPC classification number: G06F11/079 , G06F11/0709 , G06F11/0754
Abstract: Automated methods and systems for identifying problems associated with objects of a data center are described. Automated methods and systems are performed by an operations management server. For each object, the server determines a baseline distribution from historical events that are associated with a normal operational state of an object. The server determines a runtime distribution of runtime events that are associated with the object and detected in a runtime window of the object. The management server monitors runtime performance of the object while the object is running in the datacenter. When a performance problem is detected, the management server determines a root cause of a performance problem based on the baseline distribution and the runtime distribution and displays an alert in a graphical user interface of a display.
-
公开(公告)号:US20230229548A1
公开(公告)日:2023-07-20
申请号:US17580143
申请日:2022-01-20
Applicant: VMware, Inc.
Inventor: Ashot Nshan Harutyunyan , Amak Poghosyan , Naira Movses Grigoryan
IPC: G06F11/07
CPC classification number: G06F11/079 , G06F11/0709 , G06F11/0754
Abstract: Automated methods and systems for identifying problems associated with objects of a data center are described. Automated methods and systems are performed by an operations management server. For each object, the server determines a baseline distribution from historical events that are associated with a normal operational state of an object. The server determines a runtime distribution of runtime events that are associated with the object and detected in a runtime window of the object. The management server monitors runtime performance of the object while the object is running in the datacenter. When a performance problem is detected, the management server determines a root cause of a performance problem based on the baseline distribution and the runtime distribution and displays an alert in a graphical user interface of a display.
-
公开(公告)号:US20220058073A1
公开(公告)日:2022-02-24
申请号:US17492099
申请日:2021-10-01
Applicant: VMware, Inc.
Inventor: Amak Poghosyan , Ashot Nshan Harutyunyan , Naira Movses Grigoryan , Clement Pang , George Oganesyan , Davit Baghdasaryan
Abstract: The current document is directed to methods and systems that employ call traces collected by one or more call-trace services to generate call-trace-classification rules to facilitate root-cause analysis of distributed-application operational problems and failures. In a described implementation, a set of automatically labeled call traces is partitioned by the generated call-trace-classification rules. Call-trace-classification-rule generation is constrained to produce relatively simple rules with greater-than-threshold confidences and coverages. The call-trace-classification rules may point to particular services and service failures, which provides useful information to distributed-application and distributed-computer-system managers and administrators attempting to diagnose operational problems and failures that arise during execution of distributed applications within distributed computer systems. A first dataset is collected during normal distributed-application operation and a second dataset is collected during problem-associated or failure-associated operation of the distributed application. The first and second datasets are used to generate noise-subtracted call-trace-classification rules and/or diagnostic suggestions.
-
-
-