摘要:
Described are methods, systems, and computer program products for correlating operational information with subsequent action information. Operational information is received from a computing system performing one or more services. Pattern matching is performed for the operational information with data within a knowledge base. Based on the results of the pattern matching, a subsequent action associated with the operational information is determined.
摘要:
Systems and methods disclosed herein are directed to creating a service directory of dependencies for services running on a system, wherein instances of a first service are dependent upon instances of a second service. The directory of dependencies comprises metadata associated with connections between the services. The system injects faults targeting all levels of the dependencies. The system is monitored to detect failures created by the faults. The injected faults are selected from transport layer faults, memory pressure, processor pressure, storage pressure, virtual machine restart, and virtual machine shut down. A domain name service is monitored to identify names that are resolved for the services. The service directory is then updated continuously with additional dependencies using information about the resolved names. The faults may be injected in a guided manner, wherein the scope of the faults is increased in steps over time to identify a failure point in the system.
摘要:
Technologies for identification of a potential root cause of a use-after-free memory corruption bug of a program include a computing device to replay execution of the execution of the program based on an execution log of the program. The execution log comprises an ordered set of executed instructions of the program that resulted in the use-after-free memory corruption bug. The computing device compares a use-after-free memory address access of the program to a memory address associated with an occurrence of the use-after-free memory corruption bug in response to detecting the use-after-free memory address access and records the use-after-free memory address access of the program as a candidate for a root cause of the use-after-free memory corruption bug to a candidate list in response to detecting a match between the use-after-free memory address access of the program and the memory address associated with the occurrence of the use-after-free memory corruption bug.
摘要:
Various systems and methods are provided that detect faults in data-based systems utilizing techniques that stem from the field of spectral analysis and artificial intelligence. For example, a data-based system can include one or more sensors associated with a subsystem that measure time-series data. A set of indicator functions can be established that define anomalous behavior within a subsystem. The systems and methods disclosed herein can, for each sensor, analyze the time-series data measured by the respective sensor in conjunction with one or more indicator functions to identify anomalous behavior associated with the respective sensor of the subsystem. A spectral analysis can then be performed on the analysis to generate spectral responses. Clustering techniques can be used to bin the spectral response values and the binned values can be compared with fault signatures to identify faults. Identified faults can then be displayed in a user interface.
摘要:
A system is adapted to generate a configuration for a service provider system to provide a highly available (HA) service. The system first identifies type stacks that provide the HA service and one or more component types in each type stack. Each type stack is a combination of prototypes that describe features and capabilities of available software providing the HA service. The system estimates, for each component type in the type stacks, a mean-time- to-recover (MTTR) of the HA service based on time for completing an actual recovery action in response to a component failure. The system further estimates service availability provided by each type stack based on the MTTR and a mean-time-to-failure (MTTF) of each component type in the type stack. The system then eliminates one or more of the type stacks that do not satisfy a requested service availability before proceeding to subsequent steps of configuration generation.
摘要:
A method includes searching a single file, which includes a plurality of device log files for one or more devices, based on a template file that at least indicates a sub-set of log data of interest, generating reference data for each of the log data of interest that is located in the single file, and storing the reference data in a data structure. A computing system (102) includes a memory (114) that stores one or more instructions (120) including a log file processing module (126), and a processor (116) that executes the one or more instructions, which causes the processor to: filter log data based on a template file that indicates streams of data of interest; dynamically change the amount of data of interest based on the debug level; store the streams of data of interest; and display, in response to an input signal, a sub-set of the stored virtual stream of data of interest.