SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING DOCUMENT VALIDITY
    2.
    发明申请
    SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING DOCUMENT VALIDITY 有权
    用于确定文件有效性的系统,方法和计算机程序产品

    公开(公告)号:US20100202698A1

    公开(公告)日:2010-08-12

    申请号:US12368685

    申请日:2009-02-10

    IPC分类号: G06K9/46 G06K9/20

    摘要: A method according to one embodiment includes extracting an identifier from an electronic first document, and identifying a complementary document associated with the first document using the identifier. A validity of the first document is determined by simultaneously considering: textual information from the first document; textual information from the complementary document; and predefined business rules. An indication of the determined validity is output. Systems and computer program products for providing, performing, and/or enabling the methodology presented above are also presented.

    摘要翻译: 根据一个实施例的方法包括从电子第一文档提取标识符,以及使用标识符识别与第一文档相关联的补充文档。 第一个文件的有效性是通过同时考虑:第一个文件的文本信息; 补充文件的文字资料; 和预定义的业务规则。 输出确定的有效性的指示。 还提供了用于提供,执行和/或启用上述方法的系统和计算机程序产品。

    SYSTEMS AND METHODS OF ACCESSING RANDOM ACCESS CACHE FOR RESCANNING
    3.
    发明申请
    SYSTEMS AND METHODS OF ACCESSING RANDOM ACCESS CACHE FOR RESCANNING 有权
    访问随机访问缓存的系统和方法进行调整

    公开(公告)号:US20090214112A1

    公开(公告)日:2009-08-27

    申请号:US12435277

    申请日:2009-05-04

    IPC分类号: G06K9/00 G06K9/60

    CPC分类号: H04N1/40

    摘要: An efficient method and system to enhance digital acquisition devices for analog data is presented. The enhancements offered by the method and system are available to the user in local as well as in remote deployments yielding efficiency gains for a large variety of business processes. The quality enhancements of the acquired digital data are achieved efficiently by employing virtual reacquisition. The method of virtual reacquisition renders unnecessary the physical reacquisition of the analog data in case the digital data obtained by the acquisition device are of insufficient quality. The method and system allows multiple users to access the same acquisition device for analog data. In some embodiments, one or more users can virtually reacquire data provided by multiple analog or digital sources. The acquired raw data can be processed by each user according to his personal preferences and/or requirements. The preferred processing settings and attributes are determined interactively in real time as well as non real time, automatically and a combination thereof.

    摘要翻译: 提出了一种增强模拟数据采集设备的有效方法和系统。 方法和系统提供的增强功能可以在本地和远程部署中为用户提供,从而为各种业务流程带来效率提升。 通过采用虚拟反馈技术,可以有效地实现采集数字数据的质量提升。 在采集设备获得的数字数据质量不足的情况下,虚拟反馈方法不需要对模拟数据的物理重新捕获。 该方法和系统允许多个用户访问相同的采集设备以进行模拟数据。 在一些实施例中,一个或多个用户可以虚拟地重新获取由多个模拟或数字源提供的数据。 所获取的原始数据可以由每个用户根据他的个人喜好和/或要求来处理。 优选的处理设置和属性被实时地以非实时的方式交互地确定并且它们的组合。

    METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION
    4.
    发明申请
    METHODS AND SYSTEMS FOR TRANSDUCTIVE DATA CLASSIFICATION 有权
    用于传输数据分类的方法和系统

    公开(公告)号:US20100169250A1

    公开(公告)日:2010-07-01

    申请号:US12721393

    申请日:2010-03-10

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A system, method, data processing apparatus, and article of manufacture are provided for classifying data. Labeled data points are received, each of the labeled data points having at least one label indicating whether the data point is a training example for data points for being included in a designated category or a training example for data points being excluded from a designated category; receiving unlabeled data points; receiving at least one predetermined cost factor of the labeled data points and unlabeled data points; training a transductive classifier using MED through iterative calculation using the at least one cost factor and the labeled data points and the unlabeled data points as training examples; applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points; and outputting a classification of the classified data points, or derivative thereof.

    摘要翻译: 提供了一种用于对数据进行分类的系统,方法,数据处理装置和制品。 标签数据点被接收,每个标记数据点具有至少一个标签,指示数据点是否是用于包括在指定类别中的数据点的训练示例,或者是从指定类别排除的数据点的训练示例; 接收未标记的数据点; 接收标记数据点和未标记数据点的至少一个预定成本因子; 通过使用至少一个成本因子和标记的数据点和未标记的数据点作为训练示例的迭代计算来训练使用MED的转换分类器; 应用经过训练的分类器对未标记的数据点,标记数据点和输入数据点中的至少一个进行分类; 并输出分类数据点或其派生物的分类。

    EFFECTIVE MULTI-CLASS SUPPORT VECTOR MACHINE CLASSIFICATION
    5.
    发明申请
    EFFECTIVE MULTI-CLASS SUPPORT VECTOR MACHINE CLASSIFICATION 有权
    有效的多级支持向量机分类

    公开(公告)号:US20080183646A1

    公开(公告)日:2008-07-31

    申请号:US12050096

    申请日:2008-03-17

    IPC分类号: G06F15/18

    CPC分类号: G06K9/6269

    摘要: An improved method of classifying examples into multiple categories using a binary support vector machine (SVM) algorithm. In one preferred embodiment, the method includes the following steps: storing a plurality of user-defined categories in a memory of a computer, analyzing a plurality of training examples for each category so as to identify one or more features associated with each category; calculating at least one feature vector for each of the examples; transforming each of the at least one feature vectors so as reflect information about all of the training examples; and building a SVM classifier for each one of the plurality of categories, wherein the process of building a SVM classifier further includes: assigning each of the examples in a first category to a first class and all other examples belonging to other categories to a second class, wherein if anyone of the examples belongs to another category as well as the first category, such examples are assigned to the first class only, optimizing at least one tunable parameter of a SVM classifier for the first category, wherein the SVM classifier is trained using the first and second classes; and optimizing a function that converts the output of the binary SVM classifier into a probability of category membership.

    摘要翻译: 一种使用二进制支持向量机(SVM)算法将示例分类为多个类别的改进方法。 在一个优选实施例中,该方法包括以下步骤:将多个用户定义的类别存储在计算机的存储器中,分析每个类别的多个训练示例,以便识别与每个类别相关联的一个或多个特征; 为每个示例计算至少一个特征向量; 转换所述至少一个特征向量中的每一个,以便反映关于所有训练示例的信息; 以及为所述多个类别中的每个类别构建SVM分类器,其中,构建SVM分类器的过程还包括:将第一类别中的每个示例分配给第一类,将属于其他类别的所有其他示例分配给第二类 其中如果任何示例属于另一类别以及第一类别,则将这些示例仅分配给第一类,优化用于第一类别的SVM分类器的至少一个可调参数,其中,SVM分类器使用 第一类和第二类; 并优化将二进制SVM分类器的输出转换成类别成员的概率的函数。

    SYSTEMS AND METHODS FOR ORGANIZING DATA SETS
    6.
    发明申请
    SYSTEMS AND METHODS FOR ORGANIZING DATA SETS 有权
    用于组织数据集的系统和方法

    公开(公告)号:US20100262571A1

    公开(公告)日:2010-10-14

    申请号:US12826536

    申请日:2010-06-29

    IPC分类号: G06F15/18

    摘要: A method is provided for organizing data sets. In use, an automatic decision system is created or updated for determining whether data elements fit a predefined organization or not, where the decision system is based on a set of preorganized data elements. A plurality of data elements is organized using the decision system. At least one organized data element is selected for output to a user based on a score or confidence from the decision system for the at least one organized data element. Additionally, at least a portion of the at least one organized data element is output to the user. A response is received from the user comprising at least one of a confirmation, modification, and a negation of the organization of the at least one organized data element. The automatic decision system is recreated or updated based on the user response. Other embodiments are also presented.

    摘要翻译: 提供了一种用于组织数据集的方法。 在使用中,创建或更新自动决策系统以确定数据元素是否符合预定义的组织,其中决策系统基于一组预先组织的数据元素。 使用决策系统来组织多个数据元素。 基于来自决策系统对于至少一个有组织数据元素的分数或置信度,选择至少一个有组织数据元素来输出给用户。 此外,至少一个有组织数据元素的至少一部分被输出给用户。 从用户接收到包括至少一个有组织数据元素的组织的确认,修改和否定中的至少一个的响应。 基于用户响应重新创建或更新自动决策系统。 还提出了其他实施例。

    SYSTEMS AND METHODS FOR ORGANIZING DATA SETS
    7.
    发明申请
    SYSTEMS AND METHODS FOR ORGANIZING DATA SETS 有权
    用于组织数据集的系统和方法

    公开(公告)号:US20090228499A1

    公开(公告)日:2009-09-10

    申请号:US12042774

    申请日:2008-03-05

    IPC分类号: G06F17/00

    摘要: A method is provided for organizing data sets. In use, an automatic decision system is created or updated for determining whether data elements fit a predefined organization or not, where the decision system is based on a set of preorganized data elements. A plurality of data elements is organized using the decision system. At least one organized data element is selected for output to a user based on a score or confidence from the decision system for the at feast one organized data element. Additionally, at least a portion of the at least one organized data element is output to the user. A response is received from the user comprising at least one of a confirmation, modification, and a negation of the organization of the at least one organized data element. The automatic decision system is recreated or updated based on the user response. Other embodiments are also presented.

    摘要翻译: 提供了一种用于组织数据集的方法。 在使用中,创建或更新自动决策系统以确定数据元素是否符合预定义的组织,其中决策系统基于一组预先组织的数据元素。 使用决策系统来组织多个数据元素。 选择至少一个有组织的数据元素以基于来自决定系统对于一个有组织的数据元素的得分或置信度输出给用户。 此外,至少一个有组织数据元素的至少一部分被输出给用户。 从用户接收到包括至少一个有组织数据元素的组织的确认,修改和否定中的至少一个的响应。 基于用户响应重新创建或更新自动决策系统。 还提出了其他实施例。

    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES
    8.
    发明申请
    DATA CLASSIFICATION METHODS USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类方法

    公开(公告)号:US20080086433A1

    公开(公告)日:2008-04-10

    申请号:US11752719

    申请日:2007-05-23

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A method for adapting to a shift in document content according to one embodiment of the present invention includes receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Methods for separating documents are also presented. Methods for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的用于适应文档内容的偏移的方法包括:接收至少一个标记的种子文档; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还提供了分离文件的方法。 还提供了文档搜索的方法。

    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
    9.
    发明申请
    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类

    公开(公告)号:US20110196870A1

    公开(公告)日:2011-08-11

    申请号:US13090216

    申请日:2011-04-19

    IPC分类号: G06F15/18 G06F17/30

    摘要: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

    摘要翻译: 介绍了用于分类文件的系统,方法和计算机程序产品。 还提供了用于分析文档的系统,方法和计算机程序产品,例如与法律发现相关联的产品。 还介绍了用于清理数据的系统,方法和计算机程序产品。 还介绍了用于验证发票与实体关联的系统,方法和计算机程序产品。 介绍了管理医疗记录的系统,方法和计算机程序产品。 介绍了面部识别的系统,方法和计算机程序产品。

    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES
    10.
    发明申请
    DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES 有权
    使用机器学习技术的数据分类

    公开(公告)号:US20110145178A1

    公开(公告)日:2011-06-16

    申请号:US13033536

    申请日:2011-02-23

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30707 G06N99/005

    摘要: A system and article of manufacture enabling adapting to a shift in document content according to one embodiment of the present invention includes instructions for: receiving at least one labeled seed document; receiving unlabeled documents; receiving at least one predetermined cost factor; training a transductive classifier using the at least one predetermined cost factor, the at least one seed document, and the unlabeled documents; classifying the unlabeled documents having a confidence level above a predefined threshold into a plurality of categories using the classifier; reclassifying at least some of the categorized documents into the categories using the classifier; and outputting identifiers of the categorized documents to at least one of a user, another system, and another process. Systems and articles of manufacture for separating documents are also presented. Systems and articles of manufacture for document searching are also presented.

    摘要翻译: 根据本发明的一个实施例的能够适应文档内容的移动的系统和制品包括用于:接收至少一个标记的种子文档的指令; 收到未标记的文件; 接收至少一个预定的成本因子; 使用所述至少一个预定成本因素,所述至少一个种子文档和所述未标记的文档来训练转换分类器; 使用分类器将具有高于预定义阈值的置信水平的未标记文档分类成多个类别; 使用分类器将至少一些分类文档重新分类为类别; 以及将分类文档的标识符输出到用户,另一系统和另一进程中的至少一个。 还介绍了分离文件的系统和制造。 还提供了用于文档搜索的系统和制品。