Data classification using machine learning techniques
    1.
    发明授权
    Data classification using machine learning techniques 有权
    数据分类采用机器学习技术

    公开(公告)号:US08719197B2

    公开(公告)日:2014-05-06

    申请号:US13090216

    申请日:2011-04-19

    摘要: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

    摘要翻译: 介绍了用于分类文件的系统,方法和计算机程序产品。 还提供了用于分析文档的系统,方法和计算机程序产品,例如与法律发现相关联的产品。 还介绍了用于清理数据的系统,方法和计算机程序产品。 还介绍了用于验证发票与实体关联的系统,方法和计算机程序产品。 介绍了管理医疗记录的系统,方法和计算机程序产品。 介绍了面部识别的系统,方法和计算机程序产品。

    Data classification using machine learning techniques
    4.
    发明授权
    Data classification using machine learning techniques 有权
    数据分类采用机器学习技术

    公开(公告)号:US08239335B2

    公开(公告)日:2012-08-07

    申请号:US13033536

    申请日:2011-02-23

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A system and article of manufacture enabling adapting to a shift in document content according to one embodiment of the present invention includes instructions for: receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Systems and articles of manufacture for separating documents are also presented. Systems and articles of manufacture for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的能够适应文档内容的移动的系统和制品包括用于:接收至少一个标记的种子文档的指令; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还介绍了分离文件的系统和制造。 还提供了用于文档搜索的系统和制品。

    Data classification methods using machine learning techniques
    5.
    发明授权
    Data classification methods using machine learning techniques 有权
    使用机器学习技术的数据分类方法

    公开(公告)号:US07937345B2

    公开(公告)日:2011-05-03

    申请号:US11752719

    申请日:2007-05-23

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的用于适应文档内容的偏移的方法包括:接收至少一个标记的种子文档; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还提供了分离文件的方法。 还提供了文档搜索的方法。

    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES
    6.
    发明申请
    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES 审中-公开
    使用机器学习技术的数据分类方法

    公开(公告)号:US20080086432A1

    公开(公告)日:2008-04-10

    申请号:US11752691

    申请日:2007-05-23

    IPC分类号: G06F15/18

    CPC分类号: G06N20/00 G06F16/353

    摘要: Methods for analyzing prior art are presented. One method includes training a classifier based on a search query; accessing a plurality of prior art documents; performing a document classification technique on at least some of the prior art documents using the classifier; and outputting identifiers of at least some of the prior art documents based on the classification thereof. Methods for adapting a patent classification to a shift in document content are also presented. Methods for matching documents to claims are presented. Methods for classifying a patent or patent application are also presented. Methods for classifying a patent or patent application are also presented.

    摘要翻译: 提出了分析现有技术的方法。 一种方法包括基于搜索查询训练分类器; 访问多个现有技术文件; 在使用分类器的至少一些现有技术文件上执行文档分类技术; 并且基于其分类,输出至少一些现有技术文献的标识符。 还介绍了将专利分类适应于文档内容转移的方法。 介绍了将文档与权利要求进行匹配的方法。 还介绍了专利或专利申请的分类方法。 还介绍了专利或专利申请的分类方法。

    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
    7.
    发明申请
    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类

    公开(公告)号:US20110196870A1

    公开(公告)日:2011-08-11

    申请号:US13090216

    申请日:2011-04-19

    IPC分类号: G06F15/18 G06F17/30

    摘要: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

    摘要翻译: 介绍了用于分类文件的系统,方法和计算机程序产品。 还提供了用于分析文档的系统,方法和计算机程序产品,例如与法律发现相关联的产品。 还介绍了用于清理数据的系统,方法和计算机程序产品。 还介绍了用于验证发票与实体关联的系统,方法和计算机程序产品。 介绍了管理医疗记录的系统,方法和计算机程序产品。 介绍了面部识别的系统,方法和计算机程序产品。

    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES
    9.
    发明申请
    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类方法

    公开(公告)号:US20080086433A1

    公开(公告)日:2008-04-10

    申请号:US11752719

    申请日:2007-05-23

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的用于适应文档内容的偏移的方法包括:接收至少一个标记的种子文档; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还提供了分离文件的方法。 还提供了文档搜索的方法。