DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES
    1.
    发明申请
    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类方法

    公开(公告)号:US20080086433A1

    公开(公告)日:2008-04-10

    申请号:US11752719

    申请日:2007-05-23

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的用于适应文档内容的偏移的方法包括:接收至少一个标记的种子文档; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还提供了分离文件的方法。 还提供了文档搜索的方法。

    Data classification using machine learning techniques
    4.
    发明授权
    Data classification using machine learning techniques 有权
    数据分类采用机器学习技术

    公开(公告)号:US08719197B2

    公开(公告)日:2014-05-06

    申请号:US13090216

    申请日:2011-04-19

    摘要: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

    摘要翻译: 介绍了用于分类文件的系统,方法和计算机程序产品。 还提供了用于分析文档的系统,方法和计算机程序产品,例如与法律发现相关联的产品。 还介绍了用于清理数据的系统,方法和计算机程序产品。 还介绍了用于验证发票与实体关联的系统,方法和计算机程序产品。 介绍了管理医疗记录的系统,方法和计算机程序产品。 介绍了面部识别的系统,方法和计算机程序产品。

    Data classification using machine learning techniques
    5.
    发明授权
    Data classification using machine learning techniques 有权
    数据分类采用机器学习技术

    公开(公告)号:US08239335B2

    公开(公告)日:2012-08-07

    申请号:US13033536

    申请日:2011-02-23

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A system and article of manufacture enabling adapting to a shift in document content according to one embodiment of the present invention includes instructions for: receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Systems and articles of manufacture for separating documents are also presented. Systems and articles of manufacture for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的能够适应文档内容的移动的系统和制品包括用于:接收至少一个标记的种子文档的指令; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还介绍了分离文件的系统和制造。 还提供了用于文档搜索的系统和制品。

    Data classification methods using machine learning techniques
    6.
    发明授权
    Data classification methods using machine learning techniques 有权
    使用机器学习技术的数据分类方法

    公开(公告)号:US07937345B2

    公开(公告)日:2011-05-03

    申请号:US11752719

    申请日:2007-05-23

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的用于适应文档内容的偏移的方法包括:接收至少一个标记的种子文档; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还提供了分离文件的方法。 还提供了文档搜索的方法。

    Systems and methods of accessing random access cache for rescanning

    公开(公告)号:US20060215230A1

    公开(公告)日:2006-09-28

    申请号:US11329753

    申请日:2006-01-11

    IPC分类号: G06F15/00

    CPC分类号: H04N1/40

    摘要: An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.

    Virtual rescanning: a method for interactive document image quality enhancement
    8.
    发明授权
    Virtual rescanning: a method for interactive document image quality enhancement 有权
    虚拟重新扫描:提高交互式文档图像质量的方法

    公开(公告)号:US06370277B1

    公开(公告)日:2002-04-09

    申请号:US09206753

    申请日:1998-12-07

    IPC分类号: G06T1140

    摘要: According to the present invention, an image processing system for processing scanned images using a user predefined parameters and acceptable tolerances. When a scanned image falls outside the scope of the predefined parameter tolerances, the system invokes a real-time user interactive process which involves a continuous looping back process comprising prompting the user for image setting data, loading the image data from a fast memory device such as a cache, generating processed image data and displaying the processed image data on a display terminal, prompting the user for acceptance of the processed image data. This process loops back to the beginning until the user accepts the processed image data. This invention provides the user with an efficient and time saving method of scanning documents by precluding the user having to physically reload the document into the scanner should the scanned quality be unacceptable, and an interactive method to set the attributes of a paper scanner.

    摘要翻译: 根据本发明,一种用于使用用户预定义参数和可接受公差来处理扫描图像的图像处理系统。 当扫描图像超出预定义的参数公差的范围时,系统调用实时用户交互过程,其涉及连续循环处理,其包括提示用户图像设置数据,从快速存储设备加载图像数据,如 作为高速缓存,生成经处理的图像数据并在显示终端上显示处理的图像数据,提示用户接受处理的图像数据。 该过程循环回到开始,直到用户接受处理的图像数据。 本发明为用户提供了一种有效和省时的扫描文档的方法,其方法是排除用户必须在扫描质量不可接受的情况下物理地将文档重新加载到扫描仪中,以及设置纸张扫描仪的属性的交互方法。

    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
    9.
    发明申请
    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类

    公开(公告)号:US20110196870A1

    公开(公告)日:2011-08-11

    申请号:US13090216

    申请日:2011-04-19

    IPC分类号: G06F15/18 G06F17/30

    摘要: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

    摘要翻译: 介绍了用于分类文件的系统,方法和计算机程序产品。 还提供了用于分析文档的系统,方法和计算机程序产品,例如与法律发现相关联的产品。 还介绍了用于清理数据的系统,方法和计算机程序产品。 还介绍了用于验证发票与实体关联的系统,方法和计算机程序产品。 介绍了管理医疗记录的系统,方法和计算机程序产品。 介绍了面部识别的系统,方法和计算机程序产品。

    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
    10.
    发明申请
    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类

    公开(公告)号:US20110145178A1

    公开(公告)日:2011-06-16

    申请号:US13033536

    申请日:2011-02-23

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A system and article of manufacture enabling adapting to a shift in document content according to one embodiment of the present invention includes instructions for: receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Systems and articles of manufacture for separating documents are also presented. Systems and articles of manufacture for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的能够适应文档内容的移动的系统和制品包括用于:接收至少一个标记的种子文档的指令; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还介绍了分离文件的系统和制造。 还提供了用于文档搜索的系统和制品。