摘要:
A system and method for automatically classifying documents using an annotated topic tree is provided. A set of topics may be extracted from a document corpus such that each document in the document corpus is associated with a topic model. A sample set of documents may be selected from the document corpus during a current sampling round. The topic models associated with the sample set of documents may be annotated by human reviewers with coding information. Each coded document may be coded as ‘responsive’, ‘non-responsive’, ‘arguably responsive’, ‘null’, and/or for other codes or issues, which are related to the topic model associated with that document. An annotated topic tree may be formed based on the annotated topic model. One or more machine learning algorithms may be used to project the information in the annotated topic tree to the rest of the document corpus. A voting algorithm which may comprise a plurality of machine learning algorithms may also be used to project the sampling judgments to the rest of the document corpus. To continuously enhance the performance of automatic classification of documents, the projection results may be analyzed after each sampling round.
摘要:
The present disclosure describes a continuous anomaly detection method and system based on multi-dimensional behavior modeling and heterogeneous information analysis. A method includes collecting data, processing and categorizing a plurality of events, continuously clustering the plurality of events, continuously model building for behavior and information analysis, analyzing behavior and information based on a holistic model, detecting anomalies in the data, displaying an animated and interactive visualization of a behavioral model, and displaying an animated and interactive visualization of the detected anomalies.