Systems and methods for identifying and categorizing electronic documents through machine learning
    11.
    发明授权
    Systems and methods for identifying and categorizing electronic documents through machine learning 有权
    通过机器学习识别和分类电子文档的系统和方法

    公开(公告)号:US09514414B1

    公开(公告)日:2016-12-06

    申请号:US15088481

    申请日:2016-04-01

    摘要: Computer implemented systems and methods are disclosed for identifying and categorizing electronic documents through machine learning. In accordance with some embodiments, a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm. The trained document categorizer may categorize electronic documents in a large corpus of electronic documents. Performance metrics associated with performance of the trained document categorizer may be tracked, and additional seed sets of categorized electronic documents may be used to improve the performance of document categorizer by retraining the document categorizer on subsequent seed sets. Additional seed sets may and categorizations may be iterated through until a desired document categorization performance is reached.

    摘要翻译: 公开了计算机实现的系统和方法,用于通过机器学习识别和分类电子文档。 根据一些实施例,可以使用分类电子文档的种子集合来基于机器学习算法来训练文档分类器。 经过培训的文档分类器可以将电子文档分类为大型电子文档语料库。 可以跟踪与经过训练的文档分类器的性能相关联的性能度量,并且可以使用分类电子文档的附加种子集来通过在后续种子集上重新训练文档分类器来提高文档分类器的性能。 可以遍历额外的种子集合和分类,直到达到期望的文档分类表现。

    Automated assistance for generating relevant and valuable search results for an entity of interest

    公开(公告)号:US11210350B2

    公开(公告)日:2021-12-28

    申请号:US16261250

    申请日:2019-01-29

    IPC分类号: G06F16/951 G06F16/38

    摘要: Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.

    Automated assistance for generating relevant and valuable search results for an entity of interest

    公开(公告)号:US10235461B2

    公开(公告)日:2019-03-19

    申请号:US15584423

    申请日:2017-05-02

    IPC分类号: G06F17/30

    摘要: Systems and methods are provided for identifying relevant information for an entity, referred to as a seed entity. A plurality of search queries can be generated each comprising a property of a seed entity or one of the entities associated with the seed entity (seed-linked entities). Preferably, a collection of search queries includes ones representing different properties of the seed entity and properties of different seed-linked entities. Optionally, the collection of search queries is optimized to reduce search burden. Searches can then be conducted with the search queries in one or more data sources to obtain a plurality of search results, wherein each search result comprises a hit entity and one or more entities associated with the hit entity (hit-linked entity). For each of the search results, a score can be determined taking as input (a) likelihood of match between the seed entity and the hit entity or between a seed-linked entity and a hit-linked entity, (b) presence of a new entity in the search result not present in the search queries or a difference between the new entity and an entity present in the search queries, and (c) characteristic of the new entity in the search result. Based on the scores, high priority search results can be presented a user for further analysis.