Managing objects and sharing information among communities
    1.
    发明申请
    Managing objects and sharing information among communities 审中-公开
    管理对象并在社区之间共享信息

    公开(公告)号:US20050022132A1

    公开(公告)日:2005-01-27

    申请号:US10833303

    申请日:2004-04-28

    摘要: A method for managing objects for users including providing a set of attributes and a set of containers each having attributes from the set. The method further provides a user interface for dynamically assigning attributes to the objects. The method further provides for selectively displaying, through a user interface, containers and objects in the containers. An object is displayed in a container if a condition is met. The condition is applied to the attributes of the container and the attributes of the object.

    摘要翻译: 一种用于管理用户的对象的方法,包括提供一组属性和一组容器,每个容器具有来自该组的属性。 该方法还提供用于向对象动态分配属性的用户界面。 该方法还提供了通过用户界面选择性地显示容器中的容器和物体。 如果满足条件,则在容器中显示一个对象。 条件应用于容器的属性和对象的属性。

    METHOD FOR ORGANIZING LARGE NUMBERS OF DOCUMENTS

    公开(公告)号:US20100198864A1

    公开(公告)日:2010-08-05

    申请号:US12667664

    申请日:2008-07-02

    IPC分类号: G06F17/30

    摘要: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.

    METHOD FOR DETERMINING NEAR DUPLICATE DATA OBJECTS
    3.
    发明申请
    METHOD FOR DETERMINING NEAR DUPLICATE DATA OBJECTS 有权
    确定近似数据对象的方法

    公开(公告)号:US20090028441A1

    公开(公告)日:2009-01-29

    申请号:US11572441

    申请日:2005-07-07

    IPC分类号: G06K9/68

    CPC分类号: G06F17/30705 G06F17/2211

    摘要: A system for determining that a document B is a candidate for near duplicate to a document A with a given similarity level th. The system includes a storage for providing two different functions on the documents, each function having a numeric function value. The system further includes a processor associated with the storage and configured to determine that the document B is a candidate for near duplicate to the document A, if a condition is met. The condition includes: for any function ƒi from among the two functions, ƒi(A)-ƒi(B)≦δi(ƒ,A,th).

    摘要翻译: 用于确定文档B是具有给定相似度级别th的与文档A近似重复的候选的系统。 该系统包括用于在文档上提供两个不同功能的存储器,每个功能具有数字功能值。 该系统还包括与存储器相关联的处理器,并且被配置为如果满足条件,则确定文档B是与文档A近似重复的候选者。 条件包括:对于函数fi(A)-fi(B)<= deltai(f,A,th)中的任何函数fi。

    Method for organizing large numbers of documents
    4.
    发明授权
    Method for organizing large numbers of documents 有权
    组织大量文件的方法

    公开(公告)号:US08938461B2

    公开(公告)日:2015-01-20

    申请号:US12839976

    申请日:2010-07-20

    IPC分类号: G06F17/30 G06F17/22 H04L12/58

    摘要: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.

    摘要翻译: 一种计算机产品,包括用于组织多个文档的数据结构,并且能够被处理器用于操纵数据结构的数据并且能够在显示单元上显示所选择的数据。 数据结构包括多个定向互连的节点,每个节点与一个或多个具有头部和正文的文档相关联。 所有文档都与给定节点相关联,并具有相同的标准化正文。 具有相同标准化正文的所有文档都与同一个节点相关联。 一个或多个节点与多个文档相关联。 对于作为另一个节点的后代的任何节点,与节点相关联的每个文档的标准化正文文本包括与另一个节点相关联的文档的标准化正文文本。

    "> DETERMINING NEAR DUPLICATE
    5.
    发明申请
    DETERMINING NEAR DUPLICATE "NOISY" DATA OBJECTS 有权
    确定近乎重要的“噪音”数据对象

    公开(公告)号:US20100150453A1

    公开(公告)日:2010-06-17

    申请号:US12161775

    申请日:2007-01-25

    IPC分类号: G06K9/68 G06K9/40

    CPC分类号: G06F17/2211 G06K9/03

    摘要: A system configured to find near duplicate documents. For each two (or more) documents that are similar to each other, the system is configured to identify which of the differences is likely to be generated by an Optical Character Recognition software or otherwise due to difference between the original documents. As a result, the process of identifying similarity between documents is improved by identifying documents that were originally exact duplicates but are different one with respect to the other only due to OCR errors, or correct the similarity level between the documents by correcting errors introduced by the OCR tool.

    摘要翻译: 配置为找到近重复文档的系统。 对于彼此相似的每两个(或更多)个文档,系统被配置为识别光学字符识别软件可能产生哪些差异,或者由于原始文档之间的差异来识别其中的哪一个差异。 结果,通过识别原始精确重复的文档,但是仅由于OCR错误而相对于另一个的文档而改进了文档之间的相似性的过程,或者通过校正由文档引入的错误来校正文档之间的相似性级别 OCR工具。

    Method for organizing large numbers of documents
    6.
    发明授权
    Method for organizing large numbers of documents 有权
    组织大量文件的方法

    公开(公告)号:US08825673B2

    公开(公告)日:2014-09-02

    申请号:US12667664

    申请日:2008-07-02

    IPC分类号: G06F17/30 G06F17/22 H04L12/58

    摘要: A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.

    摘要翻译: 一种计算机产品,包括用于组织多个文档的数据结构,并且能够被处理器用于操纵数据结构的数据并且能够在显示单元上显示所选择的数据。 数据结构包括多个定向互连的节点,每个节点与一个或多个具有头部和正文的文档相关联。 所有文档都与给定节点相关联,并具有相同的标准化正文。 具有相同标准化正文的所有文档都与同一个节点相关联。 一个或多个节点与多个文档相关联。 对于作为另一个节点的后代的任何节点,与节点相关联的每个文档的标准化主体文本包括与另一个节点相关联的文档的标准化主体文本。

    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
    7.
    发明授权
    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith 有权
    用于加强与一起有用的一组数字文档和方法的基于专家的计算机化分析的系统

    公开(公告)号:US08533194B1

    公开(公告)日:2013-09-10

    申请号:US13161087

    申请日:2011-06-15

    IPC分类号: G06F7/00

    CPC分类号: G06N99/005

    摘要: An electronic document analysis method using a processor for analyzing N electronic documents, the method comprising providing a set of control electronic documents from among the electronic N documents; and using the set of control electronic documents and a processor to evaluate at least one aspect of a computerized text-classifier based electronic document categorization process performed on the N documents including computation of at least one statistic; wherein providing includes providing an initial set of control electronic documents; computing, using a processor, an estimated validation level of the at least one statistic assuming the initial set is used, and comparing the estimated validation level to a desired validation level, using a processor, and enlarging the initial set of control electronic documents if the estimated validation level falls below the desired validation level.

    摘要翻译: 一种使用处理器分析N个电子文档的电子文档分析方法,所述方法包括从所述电子N文档中提供一组控制电子文档; 以及使用所述一组控制电子文档和处理器来评估对所述N个文档执行的基于计算机化文本分类器的电子文档分类过程的至少一个方面,包括至少一个统计量的计算; 其中提供包括提供一组初始控制电子文档; 使用处理器来计算假设使用初始集合的至少一个统计量的估计验证级别,并且使用处理器将估计的验证级别与期望的验证级别进行比较,并且如果所述控制电子文档的初始集合放大 估计验证水平低于期望的验证水平。

    Method for determining near duplicate data objects
    8.
    发明授权
    Method for determining near duplicate data objects 有权
    确定近重复数据对象的方法

    公开(公告)号:US08015124B2

    公开(公告)日:2011-09-06

    申请号:US11572441

    申请日:2005-07-07

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30705 G06F17/2211

    摘要: A system for determining that a document B is a candidate for near duplicate to a document A with a given similarity level th. The system includes a storage for providing two different functions on the documents, each function having a numeric function value. The system further includes a processor associated with the storage and configured to determine that the document B is a candidate for near duplicate to the document A, if a condition is met. The condition includes: for any function ƒi from among the two functions, ƒi(A)−ƒi(B)≦δi(ƒ,A,th).

    摘要翻译: 用于确定文档B是具有给定相似度级别th的与文档A近似重复的候选的系统。 该系统包括用于在文档上提供两个不同功能的存储器,每个功能具有数字功能值。 该系统还包括与存储器相关联的处理器,并且被配置为如果满足条件,则确定文档B是与文档A近似重复的候选者。 条件包括:对于两个函数中的任何函数ƒi,ƒi(A)-fi(B)&nlE;δi(ƒ,A,th)。

    System and method for computerized batching of huge populations of electronic documents
    9.
    发明授权
    System and method for computerized batching of huge populations of electronic documents 有权
    大量电子文件批量化的系统和方法

    公开(公告)号:US09002842B2

    公开(公告)日:2015-04-07

    申请号:US13569752

    申请日:2012-08-08

    申请人: Yiftach Ravid

    发明人: Yiftach Ravid

    IPC分类号: G06F17/30

    摘要: A method for computerized batching of huge populations of electronic documents, including computerized assignment of electronic documents into at least one sequence of electronic document batches such that each document is assigned to a batch in the sequence of batches and such that there is no conflict between batching requirements, the following batching requirements being maintained by a suitably programmed processor: a. pre-defined subsets of documents are always kept together in the same batch, b. batches are equal in size, c. the population is partitioned into clusters, and all documents in any given batch belong to a single cluster rather than to two or more clusters.

    摘要翻译: 一种用于对大量电子文件进行计算机批量化的方法,包括将电子文档计算机化分成至少一个电子文档批次序列,使得每个文档按批次分配给批次,并且使得批处理之间不存在冲突 要求,以下配料要求由适当编程的处理器维护:a。 预定义的文件子集始终保持在同一批次中,b。 批次大小相等,c。 人口被划分成群集,任何给定批处理中的所有文档属于单个群集,而不是两个或更多个群集。

    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
    10.
    发明授权
    System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith 有权
    用于加强与一起有用的一组数字文档和方法的基于专家的计算机化分析的系统

    公开(公告)号:US08706742B1

    公开(公告)日:2014-04-22

    申请号:US13342770

    申请日:2012-01-03

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06N5/04

    摘要: A system including an electronic repository having a multiplicity of accesses to a respective multiplicity of electronic documents and metadata; a document rater using a processor to run a first computer algorithm on the multiplicity of electronic documents which yields a score which rates each of the multiplicity of electronic documents to an issue; and a metadata-based document discriminator to run a second computer algorithm on at least some of the metadata which yields leads, each lead having at least one metadata value for at least one metadata parameter, whose value correlates with the score of the electronic documents to the issue, typically used in combination with an electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues.

    摘要翻译: 一种包括具有对相应多个电子文档和元数据的多次访问的电子存储库的系统; 使用处理器对多个电子文档运行第一计算机算法的文档评分器,其产生将多个电子文档中的每一个评级为问题的分数; 以及基于元数据的文档鉴别器,用于在产生线索的至少一些元数据上运行第二计算机算法,每个线索具有用于至少一个元数据参数的至少一个元数据值,其值与电子文档的得分相关, 该问题通常与电子文档分析方法结合使用,该电子文档分析方法接收与涉及包括至少一个问题的一系列问题的案件有关的N个电子文件,并且将至少N个文档的至少一个相关性建立在该组 问题