METHOD FOR ORGANIZING LARGE NUMBERS OF DOCUMENTS
    1.
    发明申请
    METHOD FOR ORGANIZING LARGE NUMBERS OF DOCUMENTS 有权
    组织大量文件编号的方法

    公开(公告)号:US20150098660A1

    公开(公告)日:2015-04-09

    申请号:US14567460

    申请日:2014-12-11

    申请人: EQUIVIO LTD.

    IPC分类号: G06K9/00 G06F17/30 G06F17/22

    摘要: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.

    摘要翻译: 一种计算机产品,包括用于组织多个文档的数据结构,并且能够被处理器用于操纵数据结构的数据并且能够在显示单元上显示所选择的数据。 数据结构包括多个定向互连的节点,每个节点与一个或多个具有头部和正文的文档相关联。 所有文档都与给定节点相关联,并具有相同的标准化正文。 具有相同标准化正文的所有文档都与同一个节点相关联。 一个或多个节点与多个文档相关联。 对于作为另一个节点的后代的任何节点,与节点相关联的每个文档的标准化主体文本包括与另一个节点相关联的文档的标准化主体文本。

    SYSTEM AND METHOD FOR COMPUTERIZED BATCHING OF HUGE POPULATIONS OF ELECTRONIC DOCUMENTS
    2.
    发明申请
    SYSTEM AND METHOD FOR COMPUTERIZED BATCHING OF HUGE POPULATIONS OF ELECTRONIC DOCUMENTS 有权
    电子文件大批量电脑化计费系统与方法

    公开(公告)号:US20160034556A1

    公开(公告)日:2016-02-04

    申请号:US14633906

    申请日:2015-02-27

    申请人: EQUIVIO LTD.

    发明人: Yiftach RAVID

    IPC分类号: G06F17/30

    摘要: A method for computerized batching of huge populations of electronic documents, including computerized assignment of electronic documents into at least one sequence of electronic document batches such that each document is assigned to a batch in the sequence of batches and such that there is no conflict between batching requirements, the following batching requirements being maintained by a suitably programmed processor: a. pre-defined subsets of documents are always kept together in the same batch, b. batches are equal in size, c. the population is partitioned into clusters, and all documents in any given batch belong to a single cluster rather than to two or more clusters.

    摘要翻译: 一种用于对大量电子文件进行计算机批量化的方法,包括将电子文档计算机化分成至少一个电子文档批次序列,使得每个文档按批次分配给批次,并且使得批处理之间不存在冲突 要求,以下配料要求由适当编程的处理器维护:a。 预定义的文件子集始终保持在同一批次中,b。 批次大小相等,c。 人口被划分成群集,任何给定批处理中的所有文档属于单个群集,而不是两个或更多个群集。

    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
    3.
    发明授权
    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith 有权
    用于加强与一起有用的一组数字文档和方法的基于专家的计算机化分析的系统

    公开(公告)号:US08914376B2

    公开(公告)日:2014-12-16

    申请号:US13933560

    申请日:2013-07-02

    申请人: Equivio Ltd.

    发明人: Yiftach Ravid

    IPC分类号: G06F17/30 G06N99/00

    摘要: An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier's quality using the output for all documents in the control subset.

    摘要翻译: 一种电子文件分析方法,其接收涉及涉及包括至少一个问题的一系列问题的案件的N个电子文件,并且将所述至少一个N个文件的至少一个相关性建立在所述一组问题中的至少一个个别问题上,所述方法包括: 在所述一组问题中的至少一个个别问题,在所述至少N个文档的训练和控制子集中接收应用于每个文档的分类过程的输出,所述输出包括对于所述子集中的每个文档, 单独的问题指示和不相关的个​​人问题指示; 构建文本分类器,使用文档的训练子集中的所有文档的输出来模拟分类过程; 以及在至少N个文档上运行文本分类器,从而获得至少N个文档中的每一个与各个问题的相关程度的排名。 该方法还可以包括使用控制子集中的所有文档的输出来评估文本分类器的质量。

    SYSTEM AND METHOD FOR COMPUTERIZED SEMANTIC PROCESSING OF ELECTRONIC DOCUMENTS INCLUDING THEMES
    4.
    发明申请
    SYSTEM AND METHOD FOR COMPUTERIZED SEMANTIC PROCESSING OF ELECTRONIC DOCUMENTS INCLUDING THEMES 审中-公开
    电子文件包括其中的计算机语义处理系统和方法

    公开(公告)号:US20140207782A1

    公开(公告)日:2014-07-24

    申请号:US14161159

    申请日:2014-01-22

    申请人: Equivio Ltd.

    发明人: Yiftach RAVID

    IPC分类号: G06F17/30

    摘要: System and method for computerized identification of themes in a large data set, the system comprising reducing the number of data set members in a large data set, using at least one computerized data set member pruning technique other than random selection; and using a computerized theme identification technique for identifying a plurality of themes in the reduced data set.

    摘要翻译: 一种用于计算机识别大型数据集中的主题的系统和方法,所述系统包括使用除了随机选择之外的至少一种计算机数据集成员修剪技术来减少大数据集中的数据集成员的数量; 以及使用计算机化的主题识别技术来识别缩减数据集中的多个主题。

    SYSTEM AND METHODS FOR COMPUTERIZED INFORMATION GOVERNANCE OF ELECTRONIC DOCUMENTS
    5.
    发明申请
    SYSTEM AND METHODS FOR COMPUTERIZED INFORMATION GOVERNANCE OF ELECTRONIC DOCUMENTS 审中-公开
    电子文件计算机信息管理系统与方法

    公开(公告)号:US20140207786A1

    公开(公告)日:2014-07-24

    申请号:US14062233

    申请日:2013-10-24

    申请人: EQUIVIO LTD.

    IPC分类号: G06F17/30

    摘要: An information governance system comprising a plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a corresponding plurality of pre-defined retention schedules; training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice.

    摘要翻译: 一种信息治理系统,包括多个分类器,所述分类器使用截止值,用于根据相应的多个预定义的保留时间表将进入文档的群体的至少一部分分类为待保留的文档和待丢弃的文档; 培训机构根据人类信息管理专家提供的关于所管理文件范围内的一组文件的相关性输入,对分类器进行培训; 以及用于自动地使任何分类文件被保留并随后根据其预定义的保留计划被丢弃的装置,包括仅丢弃(a)被分类为要丢弃的文档的文档,以及(b)尚未被分类为文档 并自动将任何无法分类的文件保留为灰色区域数据,直至另行通知。

    SYSTEM FOR ENHANCING EXPERT-BASED COMPUTERIZED ANALYSIS OF A SET OF DIGITAL DOCUMENTS AND METHODS USEFUL IN CONJUNCTION THEREWITH

    公开(公告)号:US20130297612A1

    公开(公告)日:2013-11-07

    申请号:US13933560

    申请日:2013-07-02

    申请人: EQUIVIO LTD.

    发明人: Yiftach RAVID

    IPC分类号: G06F17/30

    摘要: An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier's quality using the output for all documents in the control subset.

    SYSTEM FOR ENHANCING EXPERT-BASED COMPUTERIZED ANALYSIS OF A SET OF DIGITAL DOCUMENTS AND METHODS USEFUL IN CONJUNCTION THEREWITH
    7.
    发明申请
    SYSTEM FOR ENHANCING EXPERT-BASED COMPUTERIZED ANALYSIS OF A SET OF DIGITAL DOCUMENTS AND METHODS USEFUL IN CONJUNCTION THEREWITH 有权
    用于增强一组数字文档的专家计算机分析系统及其连接中有用的方法

    公开(公告)号:US20150066938A1

    公开(公告)日:2015-03-05

    申请号:US14536041

    申请日:2014-11-07

    申请人: EQUIVIO LTD.

    发明人: Yiftach RAVID

    IPC分类号: G06F17/30

    摘要: An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier's quality using the output for all documents in the control subset.

    摘要翻译: 一种电子文件分析方法,其接收涉及涉及包括至少一个问题的一系列问题的案件的N个电子文件,并且将所述至少N个文件的至少一个相关性建立在所述一组问题中的至少一个个别问题上,所述方法包括: 在所述一组问题中的至少一个个别问题,在所述至少N个文档的训练和控制子集中接收应用于每个文档的分类过程的输出,所述输出包括对于所述子集中的每个文档, 单独的问题指示和不相关的个​​人问题指示; 构建文本分类器,使用文档的训练子集中的所有文档的输出来模拟分类过程; 以及在至少N个文档上运行文本分类器,从而获得至少N个文档中的每一个与各个问题的相关程度的排名。 该方法还可以包括使用控制子集中的所有文档的输出来评估文本分类器的质量。

    SYSTEM AND METHOD FOR COMPUTERIZED IDENTIFICATION AND EFFECTIVE PRESENTATION OF SEMANTIC THEMES OCCURRING IN A SET OF ELECTRONIC DOCUMENTS
    8.
    发明申请
    SYSTEM AND METHOD FOR COMPUTERIZED IDENTIFICATION AND EFFECTIVE PRESENTATION OF SEMANTIC THEMES OCCURRING IN A SET OF ELECTRONIC DOCUMENTS 有权
    用于计算机识别和有效呈现一组电子文档中的语义主题的系统和方法

    公开(公告)号:US20140207783A1

    公开(公告)日:2014-07-24

    申请号:US14161221

    申请日:2014-01-22

    申请人: Equivio Ltd.

    发明人: Yiftach RAVID

    IPC分类号: G06F17/30

    摘要: System and method for computerized identification and presentation of semantic themes occurring in a set of electronic documents, comprising performing topic modeling on the set of documents thereby to yield a set of topics and for each topic, a topic-modeling output list of words; and using a processor performing a matching algorithm to match only a subset of each topic-modeling output list of words, to the output list's corresponding topic, such that each word appears in no more than a predetermined number of subsets from among said subsets.

    摘要翻译: 用于计算机识别和呈现在一组电子文档中的语义主题的系统和方法,包括对所述文档集合执行主题建模,从而产生一组主题,并针对每个主题,主题建模输出的单词列表; 以及使用执行匹配算法的处理器将每个主题建模输出列表的子集仅匹配到输出列表的相应主题,使得每个单词出现在不超过所述子集中的预定数量的子集之间。