A Generic Architecture for Indexing Document Groups in an Inverted Text Index
    4.
    发明申请
    A Generic Architecture for Indexing Document Groups in an Inverted Text Index 有权
    用于在反文本索引中索引文档组的通用架构

    公开(公告)号:US20060155739A1

    公开(公告)日:2006-07-13

    申请号:US10905604

    申请日:2005-01-12

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30622

    摘要: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

    摘要翻译: 一种用于索引多个文档(包括多个重复文档)的方法首先从多个文档中识别一个或多个文档重复组。 然后,创建重复组的一个内容索引,而不是从重复组中的每个文档索引内容。 然而,与内容索引相反,创建了重复组中的每个文档的元数据索引。 因此,每个重复组的内容仅被索引一次,而使用这种索引技术的搜索引擎保留回答查询的能力,就好像为组中的每个文档索引了重复的内容。

    Method, system, and program for handling redirects in a search engine
    5.
    发明申请
    Method, system, and program for handling redirects in a search engine 有权
    用于在搜索引擎中处理重定向的方法,系统和程序

    公开(公告)号:US20050165800A1

    公开(公告)日:2005-07-28

    申请号:US10764771

    申请日:2004-01-26

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30882 G06F17/30864

    摘要: Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.

    摘要翻译: 公开了一种用于处理文档中的重定向的方法,系统和程序。 至少有一个等价类,包括通过重定向连接的文档。 检测每个等价类的周期,其中标记周期中的文档,使得它们不被索引。 检测到每个等价类的不完整的链,其中不完整链中的文档被标记,使得它们不被索引。 选择每个等价类的代表。

    Method, system, and program for searching documents for ranges of numeric values
    6.
    发明申请
    Method, system, and program for searching documents for ranges of numeric values 有权
    用于搜索文件数值范围的方法,系统和程序

    公开(公告)号:US20060074962A1

    公开(公告)日:2006-04-06

    申请号:US10949473

    申请日:2004-09-24

    IPC分类号: G06F17/30

    摘要: Provided are a method, system, and program for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents having values within the range of consecutive values associated with the posting list. Each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored.

    摘要翻译: 提供了用于在数值范围内搜索文档的方法,系统和程序。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与该组值范围内的连续值的范围相关联,并且包括具有与发布列表相关联的连续值范围内的值的文档的文档标识符。 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 生成的发布列表被存储。

    Virtual cursors for XML joins
    7.
    发明申请
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US20070112813A1

    公开(公告)日:2007-05-17

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。