METHODS AND SYSTEMS FOR IDENTIFYING AND RESOLVING ROOT CAUSES OF PERFORMANCE PROBLEMS IN DATA CENTER OBJECT

    公开(公告)号:US20230281070A1

    公开(公告)日:2023-09-07

    申请号:US17683601

    申请日:2022-03-01

    Applicant: VMware, Inc.

    CPC classification number: G06F11/079 G06F11/3006 G06N20/00 G06K9/6256

    Abstract: Automated methods and systems for identifying and resolving performance problems of objects of a data center are described. The automated methods and systems construct a model for identifying objects of the datacenter that are experiencing performance problems based on baseline distributions of events of the objects in a historical time period and event distributions of events of the objects in a time window located outside the historical time period. A root causes and recommendations database is constructed for resolving performance problems based on remedial measures previously performed for resolving performance problems. The model is used to monitor the objects of data center for runtime performance problems. When a performance problem with an object is detected, the root causes and recommendations database is used to identify a root cause of the performance problem and generate a recommendation for resolving the performance problem in near real time.

    Automated Methods and Systems for Managing Problem Instances of Applications in a Distributed Computing Facility

    公开(公告)号:US20220027257A1

    公开(公告)日:2022-01-27

    申请号:US17073381

    申请日:2020-10-18

    Applicant: VMware, Inc.

    Abstract: Methods and systems described herein automate troubleshooting a problem in execution of an application in a distributed computing. Methods and systems learn interesting patterns in problem instances over time. The problem instances are displayed in a graphical user interface (“GUI”) that enables a user to assign a problem type label to each historical problem instance. A machine learning model is trained to predict problem types in executing the application based on the historical problem instances and associated problem types. In response to detecting a run-time problem instance in the execution of the application. the machine learning model is used to determine one or more problem types associated with the run-time problem instance. The one or more problem types are rank-ordered and a recommendation may be generated to correct the run-time problem instance based on the highest ranked problem type.

    Methods and systems that identify dimensions related to anomalies in system components of distributed computer systems using traces, metrics, and component-associated attribute values

    公开(公告)号:US11113174B1

    公开(公告)日:2021-09-07

    申请号:US16833102

    申请日:2020-03-27

    Applicant: VMware, Inc.

    Abstract: The current document is directed to methods and systems that employ distributed-computer-system metrics collected by one or more distributed-computer-system metrics-collection services, call traces collected by one or more call-trace services, and attribute values for distributed-computer-system components to identify attribute dimensions related to anomalous behavior of distributed-computer-system components. In a described implementation, nodes correspond to particular types of system components and node instances are individual components of the component type corresponding to a node. Node instances are associated with attribute values and node are associated with attribute-value spaces defined by attribute dimensions. Using attribute values and call traces, attribute dimensions that are likely related to particular anomalous behaviors of distributed-computer-system components are determined by decision-tree-related analyses and are reported to one or more computational entities to facilitate resolution of the anomalous behaviors.

    Methods and systems that efficiently store metric data

    公开(公告)号:US10901869B2

    公开(公告)日:2021-01-26

    申请号:US15805424

    申请日:2017-11-07

    Applicant: VMware, Inc.

    Abstract: The current document is directed to methods and systems that collect metric data within computing facilities, including large data centers and cloud-computing facilities. In a described implementation, lower and higher metric-data-value thresholds are used to partition collected metric data into outlying metric data and inlying metric data. The inlying metric data is quantized to compress the inlying metric data and adjacent data points having the same quantized metric-data values are eliminated, to further compress the inlying metric data. The resulting compressed data includes original metric-data representations for outlier data points and compressed metric-data representations for inlier data points, providing accurate restored metric-data values for significant data points when compressed metric data is decompressed.

    METHODS AND SYSTEMS THAT EFFICIENTLY STORE METRIC DATA

    公开(公告)号:US20190138419A1

    公开(公告)日:2019-05-09

    申请号:US15805424

    申请日:2017-11-07

    Applicant: VMware, Inc.

    Abstract: The current document is directed to methods and systems that collect metric data within computing facilities, including large data centers and cloud-computing facilities. In a described implementation, lower and higher metric-data-value thresholds are used to partition collected metric data into outlying metric data and inlying metric data. The inlying metric data is quantized to compress the inlying metric data and adjacent data points having the same quantized metric-data values are eliminated, to further compress the inlying metric data. The resulting compressed data includes original metric-data representations for outlier data points and compressed metric-data representations for inlier data points, providing accurate restored metric-data values for significant data points when compressed metric data is decompressed.

    METHODS AND SYSTEMS TO ANALYZE EVENT SOURCES WITH EXTRACTED PROPERTIES, DETECT ANOMALIES, AND GENERATE RECOMMENDATIONS TO CORRECT ANOMALIES

    公开(公告)号:US20190026459A1

    公开(公告)日:2019-01-24

    申请号:US15653269

    申请日:2017-07-18

    Applicant: VMware, Inc.

    Abstract: Methods and systems are directed to automatically analyzing the behavior of event sources, detecting anomalies in the behavior of event sources, and generating recommendations to correct the detected anomalies. An event source can be an application program, an operating system, a virtual machine, a container, or any other source of event messages in a computer system. Method quantify the event messages generated over time to form property time series data, which is metadata regarding the event messages generated by the event source. Methods compute a threshold from the property time series data. Methods detect abnormal states of the event source when property data points of the property time series data violate the threshold. A systems administrator may be notified by a property digression alert displayed on a system console. Methods also generate a recommendation to correct the anomalous behavior and optimize performance of the event source.

Patent Agency Ranking