Processes and systems that determine efficient sampling rates of metrics generated in a distributed computing system

    公开(公告)号:US10977151B2

    公开(公告)日:2021-04-13

    申请号:US16408149

    申请日:2019-05-09

    Applicant: VMware, Inc.

    Abstract: Processes and systems described herein are directed to determining efficient sampling rates for metrics generated by various different metric sources of a distributed computing system. In one aspect, processes and systems retrieve the metrics from metric data storage and determine non-constant metrics of the metrics generated by the various metric sources. Processes and systems separately determine an efficient sampling rate for each non-constant metric by constructing a plurality of corresponding reduced metrics, each reduced metric comprising a different subsequence of the corresponding metric. Information loss is computed for each reduced metric. An efficient sampling rate is determined for each metric based on the information losses created by constructing the reduced metrics. The efficient sampling rates are applied to corresponding streams of run-time metric values and may also be used to resample the corresponding metric already stored in metric data storage, reducing storage space for the metrics.

    PROCESSES AND SYSTEMS THAT DETERMINE ABNORMAL STATES OF SYSTEMS OF A DISTRIBUTED COMPUTING SYSTEM

    公开(公告)号:US20200341833A1

    公开(公告)日:2020-10-29

    申请号:US16391746

    申请日:2019-04-23

    Applicant: VMware, Inc.

    Abstract: Automated processes and systems that detect abnormal performance of a complex computational system of a distributed computing system are described. The processes and systems determine time stamps of previous abnormal behavior of the complex computational system and determine uncorrelated metrics associated with the complex computational system. Rules are determined based on the uncorrelated metrics and the time stamps of previous abnormal behavior of the complex computational system. Each rule may be applied to run-time metric values of the uncorrelated metrics to detect abnormal behavior of the complex computational system and generate a corresponding alert in approximate real time. Each rule may include displaying a recommendation for addressing the abnormality based on remedial measures used to correct the same abnormality in the past. Each rule may also automatically trigger remedial action that automatically corrects the abnormality.

    PROCESSES AND SYSTEMS THAT DETECT OBJECT ABNORMALITIES IN A DISTRIBUTED COMPUTING SYSTEM

    公开(公告)号:US20200264965A1

    公开(公告)日:2020-08-20

    申请号:US16279043

    申请日:2019-02-19

    Applicant: VMware, Inc.

    Abstract: Computational processes and systems are directed to detecting abnormally behaving objects of a distributed computing system. An object can be a physical or a virtual object, such as a server computer, application, VM, virtual network device, or container. Processes and systems identify a set of metrics associated with an object and compute an indicator metric from the set of metrics. The indicator metric is used to label time stamps that correspond to outlier metric values of the set of metrics. The metrics and outlier time stamps are used to compute rules by machine learning. Each rule corresponds to a subset or combination of metrics and represents specific threshold conditions for metric values. The rules are applied to run-time metric data of the metrics to detect run-time abnormal behavior of the object.

    Methods and systems to prioritize alerts with quantification of alert impacts

    公开(公告)号:US10481966B2

    公开(公告)日:2019-11-19

    申请号:US15604460

    申请日:2017-05-24

    Applicant: VMware, Inc.

    Abstract: Methods and systems are directed to quantifying and prioritizing the impact of problems or changes in a computer system. Resources of a computer system are monitored by management tools. When a change occurs at a resource of a computer system or in log data generated by event sources of the computer system, one or more of the management tools generates an alert. The alert may be an alert that indicates a problem with the computer system resource or the alert may be an alert trigger identified in an event message of the log data. Methods described herein compute an impact factor that serves as a measure of the difference between event messages generated before the alert and event messages generated after the alert. The value of the impact factor associated with an alert may be used to quantitatively prioritize the alert and generate appropriate recommendations for responding to the alert.

    AUTOMATED METHODS AND SYSTEMS TO CLASSIFY AND TROUBLESHOOT PROBLEMS IN INFORMATION TECHNOLOGY SYSTEMS AND SERVICES

    公开(公告)号:US20190163550A1

    公开(公告)日:2019-05-30

    申请号:US15828133

    申请日:2017-11-30

    Applicant: VMware, Inc.

    Abstract: Automated computational methods and systems to classify and troubleshoot problems in information technology (“IT”) systems or services provided by a distributed computing system are described. Each IT system of the distribution computing system or IT service provided by the distributed computing system has an associated key performance indicator (“KPI”) used to monitor performance of the IT system or service. When real-time KPI data violates a KPI threshold, a real-time event-type distribution is computed from event messages generated by event sources associated with the IT system or service following the threshold violation. The real-time event-type distribution is compared with historical event-type distributions recorded for the KPI data in order to identify the problem and execute remedial action to resolve the problem.

    Data-agnostic anomaly detection
    46.
    发明授权

    公开(公告)号:US10241887B2

    公开(公告)日:2019-03-26

    申请号:US13853321

    申请日:2013-03-29

    Applicant: VMware, Inc.

    Abstract: This disclosure presents computational systems and methods for detecting anomalies in data output from any type of monitoring tool. The data is aggregated and sent to an alerting system for abnormality detection via comparison with normalcy bounds. The anomaly detection methods are performed by construction of normalcy bounds of the data based on the past behavior of the data output from the monitoring tool. The methods use data quality assurance and data categorization processes that allow choosing a correct procedure for determination of the normalcy bounds. The methods are completely data agnostic, and as a result, can also be used to detect abnormalities in time series data associated with any complex system.

    METHODS AND SYSTEMS TO IDENTIFY ANOMALOUS BEHAVING COMPONENTS OF A DISTRIBUTED COMPUTING SYSTEM

    公开(公告)号:US20180165142A1

    公开(公告)日:2018-06-14

    申请号:US15375386

    申请日:2016-12-12

    Applicant: VMware, Inc.

    Abstract: Methods and system described herein are directed to identifying anomalous behaving components of a distributed computing system. Methods and system collect log messages generated by a set of event log source running in the distributed computing system within an observation time window. Frequencies of various types of event messages generated within the observation time window are determined for each of the log sources. A similarity value is calculated for each pair of event sources. The similarity values are used to identify similar clusters of event sources of the distributed computing system for various management purposes. Components of the distributed computing system that are used to host the event source outliers may be identified as potentially having problems or may be an indication of future problems.

    DATA-AGNOSTIC ADJUSTMENT OF HARD THRESHOLDS BASED ON USER FEEDBACK
    49.
    发明申请
    DATA-AGNOSTIC ADJUSTMENT OF HARD THRESHOLDS BASED ON USER FEEDBACK 有权
    基于用户反馈的硬齿轮数据协调调整

    公开(公告)号:US20150370682A1

    公开(公告)日:2015-12-24

    申请号:US14312815

    申请日:2014-06-24

    Applicant: VMware, Inc.

    Abstract: This disclosure is directed to data-agnostic computational methods and systems for adjusting hard thresholds based on user feedback. Hard thresholds are used to monitor time-series data generated by a data-generating entity. The time-series data may be metric data that represents usage of the data-generating entity over time. The data is compared with a hard threshold associated with usage of the resource or process and when the data violates the threshold, an alert is typically generated and presented to a user. Methods and systems collect user feedback after a number of alerts to determine the quality and significance of the alerts. Based on the user feedback, methods and systems automatically adjust the hard thresholds to better represent how the user perceives the alerts.

    Abstract translation: 本公开涉及用于基于用户反馈来调整硬阈值的与数据无关的计算方法和系统。 硬阈值用于监视由数据生成实体生成的时间序列数据。 时间序列数据可以是表示数据生成实体随时间的使用的量度数据。 将数据与与资源或过程的使用相关联的硬阈值进行比较,并且当数据违反阈值时,通常生成警报并呈现给用户。 方法和系统通过多个警报收集用户反馈,以确定警报的质量和意义。 基于用户反馈,方法和系统自动调整硬阈值,以更好地表示用户如何感知警报。

    METHODS AND SYSTEMS FOR DETECTING AND CORRECTING TRENDING PROBLEMS WITH APPLICATIONS USING LANGUAGE MODELS

    公开(公告)号:US20250053496A1

    公开(公告)日:2025-02-13

    申请号:US18232743

    申请日:2023-08-10

    Applicant: VMware, Inc.

    Abstract: This disclosure is directed to automated computer-implemented methods and systems for detecting and correcting a trending problem with an application executing in a data center. The methods receive a new support request entered via a graphical user interface. The methods perform trend discovery of the new support request over recent time windows using a pre-trained and fine-tuned model bidirectional encoder representation from transformer. In response to detecting a trending problem described in the new support request, the method discovers recommended remedial measures for the new support request based on similar support requests previously recorded in a support request data store or on similar knowledge base articles previously recorded in a knowledge base data store. The recommended remedial measures for correcting the trending problem are executed using an operations manager of the data center.

Patent Agency Ranking