摘要:
Systems and methods are presented for content extraction from markup language text. The content extraction process may parse markup language text into a hierarchical data model and then apply one or more filters. Output filters may be used to make the process more versatile. The operation of the content extraction process and the one or more filters may be controlled by one or more settings set by a user, or automatically by a classifier. The classifier may automatically enter settings by classifying markup language text and entering settings based on this classification. Automatic classification may be performed by clustering unclassified markup language texts with previously classified markup language texts.
摘要:
A method for detecting intrusions in the operation of a computer system is disclosed which comprises gathering features from records of normal processes that access the files system of the computer, such as the Windows registry, and generating a probabilistic model of normal computer system usage based on occurrences of said features. The features of a record of a process that accesses the Windows registry are analyzed to determine whether said access to the Windows registry is an anomaly. A system is disclosed, comprising a registry auditing module configured to gather records regarding processes that access the Windows registry; a model generator configured to generate a probabilistic model of normal computer system usage based on records of a plurality of processes that access the Windows registry and that are indicative of normal computer system usage; and a model comparator configured to determine whether the access of the Windows registry is an anomaly.
摘要:
A method for detecting intrusions in the operation of a computer system is disclosed which comprises gathering features from records of normal processes that access the files system of the computer, such as the Windows registry, and generating a probabilistic model of normal computer system usage based on occurrences of said features. The features of a record of a process that accesses the Windows registry are analyzed to determine whether said access to the Windows registry is an anomaly. A system is disclosed, comprising a registry auditing module configured to gather records regarding processes that access the Windows registry; a model generator configured to generate a probabilistic model of normal computer system usage based on records of a plurality of processes that access the Windows registry and that are indicative of normal computer system usage; and a model comparator configured to determine whether the access of the Windows registry is an anomaly.
摘要:
In a method of generating an anomaly detection model for classifying activities of a computer system, using a training set of data corresponding to activity on the computer system, the training set comprising a plurality of instances of data having features, and wherein each feature in said plurality of features has a plurality of values. For a selected feature and a selected value of the selected feature, a quantity is determined which corresponds to the relative sparsity of such value. The quantity may correspond to the difference between the number occurrences of the selected value and the number of occurrences of the most frequently occurring value. These instances are classified as anomaly and added to the training set of normal data to generate a rule set or other detection model.
摘要:
The present invention relates to the use of computerized intelligent agents to facilitate the integration of networked performance of financial transactions with computerized methods of financial accounting. Incorporated into this combined financial transaction/financial accounting system are intelligent agents that automatically analyze the system information to provide users with financial advice. This invention permits the automated performance on-line of a wide variety of financial transactions and integrates these transactions with computerized financial accounting. All of this information is collated and analyzed automatically by intelligent agents, which generate user-specific financial reports, profiles, and advice, and under appropriate conditions take action.
摘要:
The semantic integration problem for merging multiple databases of very large size, the merge/purge problem, can be solved by multiple runs of the sorted neighborhood method or the clustering method with small windows followed by the computation of the transitive closure over the results of each run. The sorted neighborhood method works well under this scheme but is computationally expensive due to the sorting phase. An alternative method based on data clustering that reduces the complexity to linear time making multiple runs followed by transitive closure feasible and efficient. A method is provided for identifying duplicate records in a database, each record having at least one field and a plurality of keys, including the steps of sorting the records according to a criteria applied to a first key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a first group of duplicate records; storing the identity of the first group; sorting the records according to a criteria applied to a second key; comparing a number of consecutive sorted records to each other, wherein the number is less than a number of records in said database and identifying a second group of duplicate records; storing the identity of the second group; and subjecting the union of the first and second groups to transitive closure.
摘要:
A method for processing an image, consisting of a foreground and a background, to produce a highly compressed and accurate representation of the image, including the steps of scanning the image to create a digital image of the image, comparing the digital image against a codebook of stored digital images; matching the digital image with one of the stored digital images of the codebook; producing an index code identifying the background of the stored digital image as having matched the digital image; subtracting the stored digital image from the digital image to produce a second digital image representing the foreground of the stored digital image; and storing the second digital image with the index code. Techniques are also provided to enable merge/purge of the database(s) thereby created.
摘要:
A system that generates decoy emails and documents by automatically detecting concepts such as dates, times, people, and locations in e-mails and documents, and shifting those concepts. The system may also generate an email or document reciting a URL associated with a fake website and purported login credentials for the fake website. The system may send an alert to a user of the system when someone seeks to access the fake website.
摘要:
A system that generates decoy emails and documents by automatically detecting concepts such as dates, times, people, and locations in e-mails and documents, and shifting those concepts. The system may also generate an email or document reciting a URL associated with a fake website and purported login credentials for the fake website. The system may send an alert to a user of the system when someone seeks to access the fake website.
摘要:
A method, apparatus, and medium are provided for tracing the origin of network transmissions. Connection records are maintained at computer system for storing source and destination addresses. The connection records also maintain a statistical distribution of data corresponding to the data payload being transmitted. The statistical distribution can be compared to that of the connection records in order to identify the sender. The location of the sender can subsequently be determined from the source address stored in the connection record. The process can be repeated multiple times until the location of the original sender has been traced.