Abstract:
A system and method for the creation of locality sensitive hash signatures using weighted feature sets is disclosed. The disclosed methodology takes advantage of discretization mechanisms commonly used in computer systems to model the influence of the feature weights on the calculated hash signature. Pseudo random numbers required for the signature calculation are created in ascending order, which enables the signature generation mechanism to identify and avoid the unnecessary creation of pseudo random numbers to improve the performance of the signature calculation process. Further, hierarchic, tree-search like algorithms are used during the processing of signature weights to further decrease the number of required random numbers. The features of the Poisson Process model, like its ability to provide random numbers in ascending order and the split- and mergeability of Poisson Processes are used to further improve the performance of the signature calculation process.
Abstract:
A system and method for the distributed analysis of high frequency transaction trace data to constantly categorize incoming transaction data, identify relevant transaction categories, create per-category statistical reference and current data and perform statistical tests to identify transaction categories showing overall statistically relevant performance anomalies. The relevant transaction category detection considers both the relative transaction frequency of categories compared to the overall transaction frequency and the temporal stability of a transaction category over an observation duration. The statistical data generated for the anomaly tests contains next to data describing the overall performance of transactions of a category also data describing the transaction execution context, like the number of concurrently executed transactions or transaction load during an observation period. Anomaly tests consider current and reference execution context data in addition to statistic performance data to determine if detected statistical performance anomalies should be reported.
Abstract:
A method is disclosed that estimates causal relationships between events based on heterogeneous monitoring data. The monitoring data consists in transaction tracing data, describing the execution performance of individual transactions, resource utilization measurements of infrastructure entities like processes or operating systems and network utilization measurement data. A topology model of the monitored environment describing its entities and the communication activities of these entities is incrementally created. The location of occurred events in the topology model is determined. The topology model is used in conjunction with a domain specific causality propagation knowledge base to calculate the possibility of causal relationships between events. Different causality determination mechanisms, based on the type of involved events are used to create graphs of causal related events. A set of root cause events, representing those events with greatest global impact on all other events in an event graph is calculated for each identified event graph.
Abstract:
A system and method for the creation of locality sensitive hash signatures using weighted feature sets is disclosed. The disclosed methodology takes advantage of discretization mechanisms commonly used in computer systems to model the influence of the feature weights on the calculated hash signature. Pseudo random numbers required for the signature calculation are created in ascending order, which enables the signature generation mechanism to identify and avoid the unnecessary creation of pseudo random numbers to improve the performance of the signature calculation process. Further, hierarchic, tree-search like algorithms are used during the processing of signature weights to further decrease the number of required random numbers. The features of the Poisson Process model, like its ability to provide random numbers in ascending order and the split-and mergeability of Poisson Processes are used to further improve the performance of the signature calculation process.
Abstract:
A system and method for the analysis of log data is presented. The system uses SuperMinHash based locality sensitive hash signatures to describe the similarity between log lines. Signatures are created for incoming log lines and stored in signature indexes. Later similarity queries use those indexes to improve the query performance. The SuperMinHash algorithm uses a two staged approach to determine signature values, one stage uses a first random number to calculate the index of the signature value that is to update. The two staged approach improves the accuracy of the produced similarity estimation data for small sized signatures. The two staged approach may further be used to produce random numbers that are related, e.g. each created random number may be larger than its predecessors. This relation is used to optimize the algorithm by determining and terminating when further created random numbers have no influence on the created signature.
Abstract:
Technologies are disclosed for the automated, rule-based generation of models from arbitrary, semi-structured observation data. Context data of received observation data, like data describing the location of on which a phenomenon was observed, is used to identify related observations, to generate entities in a model describing the observed data and to assign observations to model data. Mapping rules may be used for the on-demand generation of models, and different sets of mapping rules may be used to generate different models out of the same observation data for different purposes. Further, observation time data may be used to observer the temporal evolution of the generated model. Possible use cases of the so generated models include the interpretation of observation data that describes unexpected operation conditions in view of the generated model, or to determine how a monitored system reacts on changing conditions, like increased load.
Abstract:
A system and method for the distributed analysis of high frequency transaction trace data to constantly categorize incoming transaction data, identify relevant transaction categories, create per-category statistical reference and current data and perform statistical tests to identify transaction categories showing overall statistically relevant performance anomalies. The relevant transaction category detection considers both the relative transaction frequency of categories compared to the overall transaction frequency and the temporal stability of a transaction category over an observation duration. The statistical data generated for the anomaly tests contains next to data describing the overall performance of transactions of a category also data describing the transaction execution context, like the number of concurrently executed transactions or transaction load during an observation period. Anomaly tests consider current and reference execution context data in addition to statistic performance data to determine if detected statistical performance anomalies should be reported.
Abstract:
A system and method is proposed for estimating the contribution of components of a distributed computing environment to the generation of economically relevant values, like e.g., revenue numbers. Agents are deployed to the computing environment that trace executed transactions and that monitor components used to execute those transactions. The transaction trace data also contains data about the origin/user of transactions, which may be used to group transactions corresponding to particular interactions of individual users with the monitored application into visit data. Data describing economically relevant activities of transactions, like the purchase of goods, are also observed by agents and reported in trace data. Functional dependencies described in transaction trace data and resource related dependencies derived from component monitoring data are used to identify functionality and components that contributed to the generation of business value. The generated business value is assigned to contributing components to incrementally create data describing the economic value of those components. The so generated data can be used for various business-related analyses.
Abstract:
A system and method for the estimation of the cardinality of large sets of transaction trace data is disclosed. The estimation is based on HyperLogLog data sketches that are capable to store cardinality relevant data of large sets with low and fixed memory requirements. The disclosure contains improvements to the known analysis methods for HyperLogLog data sketches that provide improved relative error behavior by eliminating a cardinality range dependent bias of the relative error. A new analysis method for HyperLogLog data structures is shown that uses maximum likelihood analysis methods on a Poisson based approximated probability model. In addition, a variant of the new analysis model is disclosed that uses multiple HyperLogLog data structured to directly provide estimation results for set operations like intersections or relative complement directly from the HyperLogLog input data.
Abstract:
A technology is presented for the efficient matching of received data elements with medium to large sets of fluctuating matching rules. To cope with high volumes of data elements, a distributed architecture is used, which leverages optimized multi-rule evaluation approaches, like Intel's Hyperscan, to achieve sublinear computational complexity for the rule matching process. Processing of rule updates, and generation of corresponding optimized evaluation instructions is performed on a central management node, which distributes generated optimized matching code to multiple worker nodes for the actual matching process. Further, optimized multi-rule evaluation is combined with application of individual match rules, to support the fast application of matching rule changes if required. Compilation strategies are applied to eventually transform individually applied rules into a corresponding optimized multi-rule evaluation form.