摘要:
A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.
摘要:
A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.
摘要:
A method is carried out by storing information describing configurations of discussion threads formed of respective series of EMTs that are exchanged among at least two individuals. The discussion threads have a root EMT, zero or more reply EMTs, and a last offspring EMT. The method is further carried out by compacting the EMT discussion threads, and indexing the compacted EMT discussion threads.
摘要:
A method includes describing the thread configurations of a volume of well-ordered electronic message transmissions (EMT) and utilizing the thread configuration data to conduct selective searches of the EMT volume. An apparatus includes a thread processor and a query manager. The thread processor analyzes the EMT threads and records the thread configuration data. The query manager utilizes the thread configuration data to conduct selective searches of the EMT volume.
摘要:
A method includes describing the thread configurations of a volume of well-ordered electronic message transmissions (EMT) and utilizing the thread configuration data to conduct selective searches of the EMT volume. An apparatus includes a thread processor and a query manager. The thread processor analyzes the EMT threads and records the thread configuration data. The query manager utilizes the thread configuration data to conduct selective searches of the EMT volume.
摘要:
A method includes describing the thread configurations of a volume of well-ordered electronic message transmissions (EMT) and utilizing the thread configuration data to conduct selective searches of the EMT volume. An apparatus includes a thread processor and a query manager. The thread processor analyzes the EMT threads and records the thread configuration data. The query manager utilizes the thread configuration data to conduct selective searches of the EMT volume.
摘要:
A method is carried out by storing information describing configurations of discussion threads formed of respective series of EMTs that are exchanged among at least two individuals. The discussion threads have a root EMT, zero or more reply EMTs, and a last offspring EMT. The method is further carried out by compacting the EMT discussion threads, and indexing the compacted EMT discussion threads.
摘要:
Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).
摘要:
Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).
摘要:
Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).