Methods and systems to enhance identifying service-level-objective degradation in a data center

    公开(公告)号:US10181983B2

    公开(公告)日:2019-01-15

    申请号:US15174017

    申请日:2016-06-06

    Applicant: VMware, Inc.

    Abstract: Methods recommend to data center customers those attributes of a data center infrastructure and application program that are associated with service-level objective (“SLO”) metric degradation and may be recorded in problem definitions. In other words, a data center customer is offered to “codify” problems primarily with atomic abnormality conditions on indicated attributes that decrease the SLO by some degree that the data center customer would like to be aware. As a result, the data center customer is warned of potentially significant SLO decline in order to prevent unwanted loss and take any necessary actions to prevent active anomalies. Methods also generate patterns of attributes that constitute core structures highly associated with degradation of the SLO metric.

    METHODS AND SYSTEMS TO PRIORITIZE ALERTS WITH QUANTIFICATION OF ALERT IMPACTS

    公开(公告)号:US20180341566A1

    公开(公告)日:2018-11-29

    申请号:US15604460

    申请日:2017-05-24

    Applicant: VMware, Inc.

    Abstract: Methods and systems are directed to quantifying and prioritizing the impact of problems or changes in a computer system. Resources of a computer system are monitored by management tools. When a change occurs at a resource of a computer system or in log data generated by event sources of the computer system, one or more of the management tools generates an alert. The alert may be an alert that indicates a problem with the computer system resource or the alert may be an alert trigger identified in an event message of the log data. Methods described herein compute an impact factor that serves as a measure of the difference between event messages generated before the alert and event messages generated after the alert. The value of the impact factor associated with an alert may be used to quantitatively prioritize the alert and generate appropriate recommendations for responding to the alert.

    AUTOMATED METHODS AND SYSTEMS FOR CALCULATING HARD THRESHOLDS
    64.
    发明申请
    AUTOMATED METHODS AND SYSTEMS FOR CALCULATING HARD THRESHOLDS 有权
    自动计算硬度阈值的方法和系统

    公开(公告)号:US20150379110A1

    公开(公告)日:2015-12-31

    申请号:US14314490

    申请日:2014-06-25

    Applicant: VMware, Inc.

    Abstract: This disclosure is directed to automated methods and systems for calculating hard thresholds used to monitor time-series data generated by data-generating entity. The methods are based on determining a cumulative distribution that characterizes the probability that data values of time-series data generated by the data-generating entity violate a hard threshold. The hard threshold is calculated as an inverse of the cumulative distribution based on a user defined risk confidence level. The hard threshold may then be used to generate alerts when time-series data generated later by the data-generating entity violate the hard threshold.

    Abstract translation: 本公开涉及用于计算用于监视由数据生成实体生成的时间序列数据的硬阈值的自动化方法和系统。 这些方法基于确定表征数据生成实体产生的时间序列数据的数据值违反硬阈值的概率的累积分布。 基于用户定义的风险可信度,硬阈值被计算为累积分布的倒数。 随后由数据生成实体生成的时间序列数据违反硬阈值,硬阈值可用于产生警报。

    METHODS AND SYSTEMS FOR PROACTIVE PROBLEM TROUBLESHOOTING AND RESOLUTION IN A CLOUD INFRASTRUCTURE

    公开(公告)号:US20250111251A1

    公开(公告)日:2025-04-03

    申请号:US18376378

    申请日:2023-10-03

    Applicant: VMware, Inc.

    Abstract: Automated computer-implemented methods and systems for troubleshooting and resolving problems with objects of a cloud infrastructure are described herein. In response to detecting abnormal behavior of an object running in the cloud infrastructure based on a key performance indicator (“KPI”) of the object, a graphical user interface (“GUI”) is displayed to enable a user to select KPIs of components of the object. For each of the components, a separate rule learning engine is deployed to generate rules for detecting a problem with the component based on the KPI of the object and the KPIs of the component. The rules are subsequently used to detect a runtime problem with the object and display in the GUI remedial measures for resolving the problem. Remedial measures are automatically executed to resolve the problem with the object via the GUI.

    Automated methods and systems for troubleshooting and optimizing performance of applications running in a distributed computing system

    公开(公告)号:US11803440B2

    公开(公告)日:2023-10-31

    申请号:US17490340

    申请日:2021-09-30

    Applicant: VMware, Inc.

    CPC classification number: G06F11/079 G06F11/3447 G06F11/3612 G06N5/04

    Abstract: Automated processes and systems troubleshoot and optimize performance of applications running in distributed computing systems. An automated computer-implemented processes train an inference model for an application based on metrics associated with the application and a key performance indicator (“KPI”) of the application. When a run-time performance problem is detected in run-time KPI values of KPI, the trained inference model is applied to run-time metrics and run-time KPI values to identify relevant run-time metrics that can be used to identify the root cause of the performance problem. The root cause of the performance problem can be used to generate a recommendation for correcting the performance problem. An alert identifying the root cause of the performance problem and the recommendation for correcting the performance problem are displayed on an interface of a display, thereby enabling correction of the performance problem and optimization of the application.

    Processes and systems that detect abnormal behavior of objects of a distributed computing system

    公开(公告)号:US11481300B2

    公开(公告)日:2022-10-25

    申请号:US16391668

    申请日:2019-04-23

    Applicant: VMware, Inc.

    Abstract: Automated processes and systems for detecting abnormally behaving objects of a distributed computing system are described. Processes and systems obtain metrics that are generated in a historical time window and are associated with an object of the distributed computing system. Processes and system use the metrics to compute a time-dependent system indicator over the historical time window. Each value of the system indicator corresponds to a point in time of the historical time window when the object was in a normal or an abnormal state. Processes and systems use the normal and abnormal states of the system indicator in the historical time window to train a state classifier that is used to detect run-time abnormal behavior of the object. When the state classifier identifies abnormal behavior of the object, an alert is generated, indicating the abnormal behavior of the object.

    METHODS AND SYSTEMS FOR INTELLIGENT SAMPLING OF NORMAL AND ERRONEOUS APPLICATION TRACES

    公开(公告)号:US20220291982A1

    公开(公告)日:2022-09-15

    申请号:US17374682

    申请日:2021-07-13

    Applicant: VMware, Inc.

    Abstract: Computer-implemented methods and systems described herein perform intelligent sampling of application traces generated by an application. Computer-implemented methods and systems determine different sampling rates based on frequency of occurrence of normal traces and erroneous traces of the application. The sampling rates for low frequency normal and erroneous traces are larger than the sampling rates for high frequency normal and erroneous traces. The relatively larger sampling rates for low frequency trace ensures that low frequency traces are sampled in sufficient numbers and are not passed over during sampling of the application traces. The sampled normal and erroneous traces are stored in a data storage device.

    Automated methods and systems to classify and troubleshoot problems in information technology systems and services

    公开(公告)号:US11294758B2

    公开(公告)日:2022-04-05

    申请号:US15828133

    申请日:2017-11-30

    Applicant: VMware, Inc.

    Abstract: Automated computational methods and systems to classify and troubleshoot problems in information technology (“IT”) systems or services provided by a distributed computing system are described. Each IT system of the distribution computing system or IT service provided by the distributed computing system has an associated key performance indicator (“KPI”) used to monitor performance of the IT system or service. When real-time KPI data violates a KPI threshold, a real-time event-type distribution is computed from event messages generated by event sources associated with the IT system or service following the threshold violation. The real-time event-type distribution is compared with historical event-type distributions recorded for the KPI data in order to identify the problem and execute remedial action to resolve the problem.

Patent Agency Ranking