Systems and methods for identifying and categorizing electronic documents through machine learning
    2.
    发明授权
    Systems and methods for identifying and categorizing electronic documents through machine learning 有权
    通过机器学习识别和分类电子文档的系统和方法

    公开(公告)号:US09514414B1

    公开(公告)日:2016-12-06

    申请号:US15088481

    申请日:2016-04-01

    Abstract: Computer implemented systems and methods are disclosed for identifying and categorizing electronic documents through machine learning. In accordance with some embodiments, a seed set of categorized electronic documents may be used to train a document categorizer based on a machine learning algorithm. The trained document categorizer may categorize electronic documents in a large corpus of electronic documents. Performance metrics associated with performance of the trained document categorizer may be tracked, and additional seed sets of categorized electronic documents may be used to improve the performance of document categorizer by retraining the document categorizer on subsequent seed sets. Additional seed sets may and categorizations may be iterated through until a desired document categorization performance is reached.

    Abstract translation: 公开了计算机实现的系统和方法,用于通过机器学习识别和分类电子文档。 根据一些实施例,可以使用分类电子文档的种子集合来基于机器学习算法来训练文档分类器。 经过培训的文档分类器可以将电子文档分类为大型电子文档语料库。 可以跟踪与经过训练的文档分类器的性能相关联的性能度量,并且可以使用分类电子文档的附加种子集来通过在后续种子集上重新训练文档分类器来提高文档分类器的性能。 可以遍历额外的种子集合和分类,直到达到期望的文档分类表现。

Patent Agency Ranking