A Generic Architecture for Indexing Document Groups in an Inverted Text Index
    2.
    发明申请
    A Generic Architecture for Indexing Document Groups in an Inverted Text Index 有权
    用于在反文本索引中索引文档组的通用架构

    公开(公告)号:US20060155739A1

    公开(公告)日:2006-07-13

    申请号:US10905604

    申请日:2005-01-12

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30622

    摘要: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

    摘要翻译: 一种用于索引多个文档(包括多个重复文档)的方法首先从多个文档中识别一个或多个文档重复组。 然后,创建重复组的一个内容索引,而不是从重复组中的每个文档索引内容。 然而,与内容索引相反,创建了重复组中的每个文档的元数据索引。 因此,每个重复组的内容仅被索引一次,而使用这种索引技术的搜索引擎保留回答查询的能力,就好像为组中的每个文档索引了重复的内容。

    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND)
    8.
    发明授权
    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 有权
    用于执行非结构化信息管理和自动文本分析的系统,方法和计算机程序产品,包括用作加权的搜索运算符和(WAND)

    公开(公告)号:US07512602B2

    公开(公告)日:2009-03-31

    申请号:US11607080

    申请日:2006-11-30

    IPC分类号: G06F17/30

    摘要: Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).

    摘要翻译: 公开了一种用于非结构化信息管理系统(UIMS)的系统架构,组件和搜索技术。 UIMS可以作为中间件提供,用于通过广泛的信息源有效地管理和交换非结构化信息。 该架构通常包括搜索引擎,数据存储,包含流水线文档注释器和各种适配器的分析引擎。 搜索技术利用了两级搜索技术。 搜索查询包括包含多个搜索子表达式的搜索运算符,每个搜索子表达式具有相关联的权重值。 搜索引擎返回具有超过阈值权重值和的权重值和的文档或文档。 搜索运算符实现为一个布尔谓词,用作加权AND(WAND)。

    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND)
    9.
    发明授权
    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 有权
    用于执行非结构化信息管理和自动文本分析的系统,方法和计算机程序产品,包括用作加权AND(WAND)的搜索运算符,

    公开(公告)号:US08280903B2

    公开(公告)日:2012-10-02

    申请号:US12138857

    申请日:2008-06-13

    IPC分类号: G06F17/30

    摘要: Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).

    摘要翻译: 公开了一种用于非结构化信息管理系统(UIMS)的系统架构,组件和搜索技术。 UIMS可以作为中间件提供,用于通过广泛的信息源有效地管理和交换非结构化信息。 该架构通常包括搜索引擎,数据存储,包含流水线文档注释器和各种适配器的分析引擎。 搜索技术利用了两级搜索技术。 搜索查询包括包含多个搜索子表达式的搜索运算符,每个搜索子表达式具有相关联的权重值。 搜索引擎返回具有超过阈值权重值和的权重值和的文档或文档。 搜索运算符实现为一个布尔谓词,用作加权AND(WAND)。

    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND)
    10.
    发明授权
    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 有权
    用于执行非结构化信息管理和自动文本分析的系统,方法和计算机程序产品,包括用作加权AND(WAND)的搜索运算符,

    公开(公告)号:US07146361B2

    公开(公告)日:2006-12-05

    申请号:US10449265

    申请日:2003-05-30

    IPC分类号: G06F17/30

    摘要: Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).

    摘要翻译: 公开了一种用于非结构化信息管理系统(UIMS)的系统架构,组件和搜索技术。 UIMS可以作为中间件提供,用于通过广泛的信息源有效地管理和交换非结构化信息。 该架构通常包括搜索引擎,数据存储,包含流水线文档注释器和各种适配器的分析引擎。 搜索技术利用了两级搜索技术。 搜索查询包括包含多个搜索子表达式的搜索运算符,每个搜索子表达式具有相关联的权重值。 搜索引擎返回具有超过阈值权重值和的权重值和的文档或文档。 搜索运算符实现为一个布尔谓词,用作加权AND(WAND)。