Abstract:
A learning module of an information retrieval system is configured to automatically learn distinctive characteristics used by different web sites when presenting data variables of interest. The learned information can then be used to identify data variables of interest on arbitrary web pages of the web sites. In one embodiment, the learning process is guided by feeds provided by the web sites that list values for data variables of interest, and by web pages also provided by the web sites. The values of the feeds enable the learning module to identify candidate portions of the web pages that may represent a data variable of interest. Weights are computed for different values of various properties of the candidate portions, aggregated over all the analyzed pages, and used to identify one of the candidate portions as being the best candidates.
Abstract:
This specification describes technologies relating to detecting anomalous user accounts. A computer implemented method is disclosed which evaluates an unknown status user account. The method described compares features associated with a plurality of known anomalous user accounts stored in a database to features present in the unknown account. A correlation value corresponding to the probability of a specific feature occurring in a particular anomalous user account is calculated and a dependence value corresponding to the degree of dependence between the given feature and at least one other feature is also calculated. A subset of features in the unknown account is generated comprising those features that possess a correlation value less than a threshold value and a dependence value below a maximum correlation value. A risk score for the unknown account is calculated by selecting those features from the subset that maximizes the correlation value. The unknown account is then reviewed by an account reviewer if the risk score exceeds a threshold value.