System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
    11.
    发明授权
    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith 有权
    用于加强与一起有用的一组数字文档和方法的基于专家的计算机化分析的系统

    公开(公告)号:US08527523B1

    公开(公告)日:2013-09-03

    申请号:US12559173

    申请日:2009-09-14

    申请人: Yiftach Ravid

    发明人: Yiftach Ravid

    IPC分类号: G06F17/30

    摘要: An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier's quality using the output for all documents in the control subset.

    摘要翻译: 一种电子文件分析方法,其接收涉及涉及包括至少一个问题的一系列问题的案件的N个电子文件,并且将所述至少一个N个文件的至少一个相关性建立在所述一组问题中的至少一个个别问题上,所述方法包括: 在所述一组问题中的至少一个个别问题,在所述至少N个文档的训练和控制子集中接收应用于每个文档的分类过程的输出,所述输出包括对于所述子集中的每个文档, 单独的问题指示和不相关的个​​人问题指示; 构建文本分类器,使用文档的训练子集中的所有文档的输出来模拟分类过程; 以及在至少N个文档上运行文本分类器,从而获得至少N个文档中的每一个与各个问题的相关程度的排名。 该方法还可以包括使用控制子集中的所有文档的输出来评估文本分类器的质量。

    Determining near duplicate “noisy” data objects
    12.
    发明授权
    Determining near duplicate “noisy” data objects 有权
    确定接近重复的嘈杂数据对象

    公开(公告)号:US08391614B2

    公开(公告)日:2013-03-05

    申请号:US12161775

    申请日:2007-01-25

    IPC分类号: G06K9/68 G06K9/40

    CPC分类号: G06F17/2211 G06K9/03

    摘要: A system configured to find near duplicate documents. For each two (or more) documents that are similar to each other, the system is configured to identify which of the differences is likely to be generated by an Optical Character Recognition software or otherwise due to difference between the original documents. As a result, the process of identifying similarity between documents is improved by identifying documents that were originally exact duplicates but are different one with respect to the other only due to OCR errors, or correct the similarity level between the documents by correcting errors introduced by the OCR tool.

    摘要翻译: 配置为找到近重复文档的系统。 对于彼此相似的每两个(或更多)个文档,系统被配置为识别光学字符识别软件可能产生哪些差异,或者由于原始文档之间的差异来识别其中的哪一个差异。 结果,通过识别原始精确重复的文档,但是仅由于OCR错误而相对于另一个的文档而改进了文档之间的相似性的过程,或者通过校正由文档引入的错误来校正文档之间的相似性级别 OCR工具。

    Computerized system for enhancing expert-based processes and methods useful in conjunction therewith
    13.
    发明授权
    Computerized system for enhancing expert-based processes and methods useful in conjunction therewith 有权
    用于加强与专家有关的基于专家的过程和方法的计算机化系统

    公开(公告)号:US08346685B1

    公开(公告)日:2013-01-01

    申请号:US12428100

    申请日:2009-04-22

    申请人: Yiftach Ravid

    发明人: Yiftach Ravid

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: A computerized system for enhancing expert-based processes, the system comprising a computerized expert based data analyzer receiving input from a plurality of experts by operating a corresponding plurality of expert-based processes on a body of data, the input including a discrepancy set including at least one point of discrepancy regarding which less than all of the plurality of experts agree and an agreement set including at least one point of agreement regarding which all of the plurality of experts agree; and an oracle from which oracle input is received resolving at least the point of discrepancy and not resolving any point of agreement in the agreement set; wherein the computerized analyzer is operative to select and to subsequently actuate for purposes of receiving input regarding the body of data, a subset of better experts from among the plurality of experts based on the oracle input.

    摘要翻译: 一种用于增强基于专家的过程的计算机化系统,所述系统包括基于计算机的专家数据分析器,其通过在一组数据上操作对应的多个基于专家的过程来接收来自多个专家的输入,所述输入包括差异集合, 至少有一点不同于所有多个专家所同意的差异,以及包含所有多个专家同意的至少一个协议点的协议; 并且接收oracle输入的oracle至少解决差异点,而不解决协议集中的任何一致意见; 其中所述计算机化分析器可操作以选择并随后致动以便基于所述oracle输入从所述多个专家中接收关于所述数据主体的输入的更好的专家的子集。