Learning characteristics for extraction of information from web pages

    公开(公告)号:US09443250B1

    公开(公告)日:2016-09-13

    申请号:US13859530

    申请日:2013-04-09

    Applicant: Google Inc.

    Abstract: A learning module of an information retrieval system is configured to automatically learn distinctive characteristics used by different web sites when presenting data variables of interest. The learned information can then be used to identify data variables of interest on arbitrary web pages of the web sites. In one embodiment, the learning process is guided by feeds provided by the web sites that list values for data variables of interest, and by web pages also provided by the web sites. The values of the feeds enable the learning module to identify candidate portions of the web pages that may represent a data variable of interest. Weights are computed for different values of various properties of the candidate portions, aggregated over all the analyzed pages, and used to identify one of the candidate portions as being the best candidates.

    Detection of anomalous instances through dynamic feature selection analysis
    2.
    发明授权
    Detection of anomalous instances through dynamic feature selection analysis 有权
    通过动态特征选择分析检测异常实例

    公开(公告)号:US09258314B1

    公开(公告)日:2016-02-09

    申请号:US13842511

    申请日:2013-03-15

    Applicant: Google Inc.

    CPC classification number: H04L63/14 H04L63/1433

    Abstract: This specification describes technologies relating to detecting anomalous user accounts. A computer implemented method is disclosed which evaluates an unknown status user account. The method described compares features associated with a plurality of known anomalous user accounts stored in a database to features present in the unknown account. A correlation value corresponding to the probability of a specific feature occurring in a particular anomalous user account is calculated and a dependence value corresponding to the degree of dependence between the given feature and at least one other feature is also calculated. A subset of features in the unknown account is generated comprising those features that possess a correlation value less than a threshold value and a dependence value below a maximum correlation value. A risk score for the unknown account is calculated by selecting those features from the subset that maximizes the correlation value. The unknown account is then reviewed by an account reviewer if the risk score exceeds a threshold value.

    Abstract translation: 本规范描述了检测异常用户帐户的技术。 公开了一种评估未知状态用户帐户的计算机实现方法。 所描述的方法将存储在数据库中的多个已知异常用户帐户相关联的特征与存在于未知帐户中的特征进行比较。 计算对应于在特定异常用户帐户中出现的特定特征的概率的相关值,并且还计算与给定特征与至少一个其他特征之间的依赖程度相对应的依赖值。 产生未知帐户中的特征的子集,其包括具有小于阈值的相关值和低于最大相关值的相关值的特征。 通过从子集中选择最大化相关值的特征来计算未知账户的风险评分。 如果风险分数超过阈值,则帐户审核人员将审查未知帐户。

Patent Agency Ranking