摘要:
Techniques and systems for cross-trace scalable issue detection and clustering that scale-up trace analysis for issue detection and root-cause clustering using a machine learning based approach are described herein. These techniques enable a scalable performance analysis framework for computing devices addressing issue detection, which is designed as a multiple scale feature for learning based on issue detection, and root cause clustering. In various embodiments the techniques employ a cross-trace similarity model, which is defined to hierarchically cluster problems detected in the learning based issue detection via butterflies of trigram stacks. The performance analysis framework is scalable to manage millions of traces, which include high problem complexity.
摘要:
Techniques and systems for cross-trace scalable issue detection and clustering that scale-up trace analysis for issue detection and root-cause clustering using a machine learning based approach are described herein. These techniques enable a scalable performance analysis framework for computing devices addressing issue detection, which is designed as a multiple scale feature for learning based issue detection, and root cause clustering. In various embodiments the techniques employ a cross-trace similarity model, which is defined to hierarchically cluster problems detected in the learning based issue detection via butterflies of trigram stacks. The performance analysis framework is scalable to manage millions of traces, which include high problem complexity.
摘要:
A call pattern database is mined to identify frequently occurring call patterns related to program execution instances. An SVM classifier is iteratively trained based at least in part on classifications provided by human analysts; at each iteration, the SVM classifier identifies boundary cases, and requests human analysis of these cases. The trained SVM classifier is then applied to call pattern pairs to produce similarity measures between respective call patterns of each pair, and the call patterns are clustered based on the similarity measures.
摘要:
Execution traces are collected from multiple execution instances that exhibit performance issues such as slow execution. Call stacks are extracted from the execution traces, and the call stacks are mined to identify frequently occurring function call patterns. The call patterns are then clustered, and used to identify groups of execution instances whose performance issues may be caused by common problematic program execution patterns.
摘要:
Execution traces are collected from multiple execution instances that exhibit performance issues such as slow execution. Call stacks are extracted from the execution traces, and the call stacks are mined to identify frequently occurring function call patterns. The call patterns are then clustered, and used to identify groups of execution instances whose performance issues may be caused by common problematic program execution patterns.
摘要:
A system for frequent pattern mining uses two layers of processing: a plurality of computing nodes, and a plurality of processors within each computing node. Within each computing node, the data set against which the frequent pattern mining is to be performed is stored in shared memory, accessible concurrently by each of the processors. The search space is partitioned among the computing nodes, and sub-partitioned among the processors of each computing node. If a processor completes its sub-partition, it requests another sub-partition. The partitioning and sub-partitioning may be performed dynamically, and adjusted in real time.
摘要:
A system for frequent pattern mining uses two layers of processing: a plurality of computing nodes, and a plurality of processors within each computing node. Within each computing node, the data set against which the frequent pattern mining is to be performed is stored in shared memory, accessible concurrently by each of the processors. The search space is partitioned among the computing nodes, and sub-partitioned among the processors of each computing node. If a processor completes its sub-partition, it requests another sub-partition. The partitioning and sub-partitioning may be performed dynamically, and adjusted in real time.
摘要:
Described is a technology by which online recognition of handwritten input data is combined with offline recognition and processing to obtain a combined recognition result. In general, the combination improves overall recognition accuracy. In one aspect, online and offline recognition is separately performed to obtain online and offline character-level recognition scores for candidates (hypotheses). A statistical analysis-based combination algorithm, an AdaBoost algorithm, and/or a neural network-based combination may determine a combination function to combine the scores to produce a result set of one or more results. Online and offline radical-level recognition may be performed. For example, a HMM recognizer may generate online radical scores used to build a radical graph, which is then rescored using the offline radical recognition scores. Paths in the rescored graph are then searched to provide the combined recognition result, e.g., corresponding to the path with the highest score.
摘要:
Exemplary techniques are described for selecting radical sets for use in probabilistic East Asian character recognition algorithms. An exemplary technique includes applying a decomposition rule to each East Asian character of the set to generate a progressive splitting graph where the progressive splitting graph comprises radicals as nodes, formulating an optimization problem to find an optimal set of radicals to represent the set of East Asian characters using maximum likelihood and minimum description length and solving the optimization problem for the optimal set of radicals. Another exemplary technique includes selecting an optimal set of radicals by using a general function that characterizes a radical with respect to other East Asian characters and a complex function that characterizes complexity of a radical.
摘要:
An exemplary method for online character recognition of characters includes acquiring time sequential, online ink data for a handwritten character, conditioning the ink data to produce conditioned ink data where the conditioned ink data includes information as to writing sequence of the handwritten character and extracting features from the conditioned ink data where the features include a tangent feature, a curvature feature, a local length feature, a connection point feature and an imaginary stroke feature. Such a method may determine neighborhoods for ink data and extract features for each neighborhood. An exemplary character recognition system may use various exemplary methods for training and character recognition.