摘要:
Architecture for aggregating health alerts from a number of related components into a single aggregated health state that can be analyzed to isolate the component responsible for the fault condition. In a hierarchy of related components within various component groups in a computer system, a number of health indicators can indicate alerts occurring in one or more of the related components whereas the fault condition occurs in only one component upon which the other components depend. The health indicators of related components are aggregated into an aggregated health state for each component group. These aggregated health states are analyzed to identify the related component associated with a root cause of the alert condition for an affected component group.
摘要:
A multi-level monitoring system is provided for monitoring multiple performance aspects of a cloud service concurrently in order to generate a full and reliable performance analysis of the cloud service. The multi-level monitoring system may include a set of components for carrying out the performance analysis of the cloud service which may be deployed together to operate externally, internally, or concurrently with the cloud service. The component framework of the multi-level monitoring system may include a main component, a plug-in associated with the main component, a definition database, a log database, and an output database. The main components of an example multi-level monitoring framework may include a probe runner component for probing the cloud service, a monitor component for generating alerts based on probe results, and a responder component for processing the alerts and taking appropriate actions to improve the cloud service performance.
摘要:
Architecture for aggregating health alerts from a number of related components into a single aggregated health state that can be analyzed to isolate the component responsible for the fault condition. In a hierarchy of related components within various component groups in a computer system, a number of health indicators can indicate alerts occurring in one or more of the related components whereas the fault condition occurs in only one component upon which the other components depend. The health indicators of related components are aggregated into an aggregated health state for each component group. These aggregated health states are analyzed to identify the related component associated with a root cause of the alert condition for an affected component group.
摘要:
A multi-level monitoring system is provided for monitoring multiple performance aspects of a cloud service concurrently in order to generate a full and reliable performance analysis of the cloud service. The multi-level monitoring system may include a set of components for carrying out the performance analysis of the cloud service which may be deployed together to operate externally, internally, or concurrently with the cloud service. The component framework of the multi-level monitoring system may include a main component, a plug-in associated with the main component, a definition database, a log database, and an output database. The main components of an example multi-level monitoring framework may include a probe runner component for probing the cloud service, a monitor component for generating alerts based on probe results, and a responder component for processing the alerts and taking appropriate actions to improve the cloud service performance.
摘要:
Architecture for aggregating health alerts from a number of related components into a single aggregated health state that can be analyzed to isolate the component responsible for the fault condition. In a hierarchy of related components within various component groups in a computer system, a number of health indicators can indicate alerts occurring in one or more of the related components whereas the fault condition occurs in only one component upon which the other components depend. The health indicators of related components are aggregated into an aggregated health state for each component group. These aggregated health states are analyzed to identify the related component associated with a root cause of the alert condition for an affected component group.
摘要:
A hosting provider operates a server system that provides a service to one or more tenants. The server system receives configuration data from the tenants. As part of providing the service to a given tenant, the server system attempts to access an external service due to the configuration data received from the given tenant identifying the external service. Service access errors can occur when attempting to access the external service. In response to determining that an error has occurred when attempting to access the external service, the server system sends a service access alert to a recipient associated with the given tenant. The service access alert notifies the recipient that the error has occurred.
摘要:
Architecture for aggregating health alerts from a number of related components into a single aggregated health state that can be analyzed to isolate the component responsible for the fault condition. In a hierarchy of related components within various component groups in a computer system, a number of health indicators can indicate alerts occurring in one or more of the related components whereas the fault condition occurs in only one component upon which the other components depend. The health indicators of related components are aggregated into an aggregated health state for each component group. These aggregated health states are analyzed to identify the related component associated with a root cause of the alert condition for an affected component group.
摘要:
A hosting provider operates a server system that provides a service to one or more tenants. The server system receives configuration data from the tenants. As part of providing the service to a given tenant, the server system attempts to access an external service due to the configuration data received from the given tenant identifying the external service. Service access errors can occur when attempting to access the external service. In response to determining that an error has occurred when attempting to access the external service, the server system sends a service access alert to a recipient associated with the given tenant. The service access alert notifies the recipient that the error has occurred.