Efficient and phased method of processing large collections of electronic data known as “best match first”™ for electronic discovery and other related applications
    1.
    发明授权
    Efficient and phased method of processing large collections of electronic data known as “best match first”™ for electronic discovery and other related applications 有权
    处理大量电子数据收集的高效和分阶段方法,被称为“最佳匹配第一”™,用于电子发现和其他相关应用

    公开(公告)号:US08819021B1

    公开(公告)日:2014-08-26

    申请号:US12021259

    申请日:2008-01-28

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30657

    摘要: A method of more efficient, phased, iterative processing of very large collections of electronic data for the purposes of electronic discovery and related applications is disclosed. The processing minimally includes: text extraction, and the creation of a keyword search index, but may include many additional stages of processing as well. The method further includes: definition of an initial set of characteristics that correspond to “interesting” data, followed by the iterative completion of processing of this data based on a combination of user feedback on the overall relevance of the documents being processed and the system's assessment of whether or not the data it has recently selected to promote in the processing completion queue has the desired quality and quantity of relevant data. The process continues until all identified data has either been fully processed, or discarded at some intermediate stage of processing as being likely irrelevant. This has the result of effectively finishing the processing much earlier, as the later documents in the processing queue will be increasingly irrelevant.

    摘要翻译: 公开了一种用于电子发现和相关应用目的的用于电子数据的非常大的集合的更有效,分阶段,迭代处理的方法。 处理最低限度包括:文本提取和关键字搜索索引的创建,但也可以包括许多其他处理阶段。 该方法还包括:定义对应于“有趣”数据的初始特征集合,随后基于用户对正在处理的文档的整体相关性和系统评估的组合的反馈来完成该数据的处理 其最近选择在数据处理完成队列中促进的数据是否具有所需的相关数据质量和数量。 该过程一直持续到所有已识别的数据已被完全处理,或者在处理的某个中间阶段丢弃,因为可能不相关。 这样做的结果是能够更早地有效地完成处理,因为处理队列中的后续文档将越来越不相关。