摘要:
A method for managing objects for users including providing a set of attributes and a set of containers each having attributes from the set. The method further provides a user interface for dynamically assigning attributes to the objects. The method further provides for selectively displaying, through a user interface, containers and objects in the containers. An object is displayed in a container if a condition is met. The condition is applied to the attributes of the container and the attributes of the object.
摘要:
A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
摘要:
A system for determining that a document B is a candidate for near duplicate to a document A with a given similarity level th. The system includes a storage for providing two different functions on the documents, each function having a numeric function value. The system further includes a processor associated with the storage and configured to determine that the document B is a candidate for near duplicate to the document A, if a condition is met. The condition includes: for any function ƒi from among the two functions, ƒi(A)-ƒi(B)≦δi(ƒ,A,th).
摘要:
A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
摘要:
A system configured to find near duplicate documents. For each two (or more) documents that are similar to each other, the system is configured to identify which of the differences is likely to be generated by an Optical Character Recognition software or otherwise due to difference between the original documents. As a result, the process of identifying similarity between documents is improved by identifying documents that were originally exact duplicates but are different one with respect to the other only due to OCR errors, or correct the similarity level between the documents by correcting errors introduced by the OCR tool.
摘要:
A computer product including a data structure for organizing of a plurality of documents, and capable of being utilized by a processor for manipulating data of the data structure and capable of displaying selected data on a display unit. The data structure includes a plurality of directionally interlinked nodes, each node being associated with one or more documents having a header and body text. All the documents are associated with a given node and have identical normalized body text. All documents that have identical normalized body text are associated with the same node. One or more of the nodes is associated with more than one document. For any node that is a descendent of another node, the normalized body text of each document associated with the node is inclusive of the normalized body text of a document that is associated with the other node.
摘要:
An electronic document analysis method using a processor for analyzing N electronic documents, the method comprising providing a set of control electronic documents from among the electronic N documents; and using the set of control electronic documents and a processor to evaluate at least one aspect of a computerized text-classifier based electronic document categorization process performed on the N documents including computation of at least one statistic; wherein providing includes providing an initial set of control electronic documents; computing, using a processor, an estimated validation level of the at least one statistic assuming the initial set is used, and comparing the estimated validation level to a desired validation level, using a processor, and enlarging the initial set of control electronic documents if the estimated validation level falls below the desired validation level.
摘要:
A system for determining that a document B is a candidate for near duplicate to a document A with a given similarity level th. The system includes a storage for providing two different functions on the documents, each function having a numeric function value. The system further includes a processor associated with the storage and configured to determine that the document B is a candidate for near duplicate to the document A, if a condition is met. The condition includes: for any function ƒi from among the two functions, ƒi(A)−ƒi(B)≦δi(ƒ,A,th).
摘要:
A method for computerized batching of huge populations of electronic documents, including computerized assignment of electronic documents into at least one sequence of electronic document batches such that each document is assigned to a batch in the sequence of batches and such that there is no conflict between batching requirements, the following batching requirements being maintained by a suitably programmed processor: a. pre-defined subsets of documents are always kept together in the same batch, b. batches are equal in size, c. the population is partitioned into clusters, and all documents in any given batch belong to a single cluster rather than to two or more clusters.
摘要:
A system including an electronic repository having a multiplicity of accesses to a respective multiplicity of electronic documents and metadata; a document rater using a processor to run a first computer algorithm on the multiplicity of electronic documents which yields a score which rates each of the multiplicity of electronic documents to an issue; and a metadata-based document discriminator to run a second computer algorithm on at least some of the metadata which yields leads, each lead having at least one metadata value for at least one metadata parameter, whose value correlates with the score of the electronic documents to the issue, typically used in combination with an electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues.