A Generic Architecture for Indexing Document Groups in an Inverted Text Index
    2.
    发明申请
    A Generic Architecture for Indexing Document Groups in an Inverted Text Index 有权
    用于在反文本索引中索引文档组的通用架构

    公开(公告)号:US20060155739A1

    公开(公告)日:2006-07-13

    申请号:US10905604

    申请日:2005-01-12

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30622

    摘要: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

    摘要翻译: 一种用于索引多个文档(包括多个重复文档)的方法首先从多个文档中识别一个或多个文档重复组。 然后,创建重复组的一个内容索引,而不是从重复组中的每个文档索引内容。 然而,与内容索引相反,创建了重复组中的每个文档的元数据索引。 因此,每个重复组的内容仅被索引一次,而使用这种索引技术的搜索引擎保留回答查询的能力,就好像为组中的每个文档索引了重复的内容。

    Method, system, and program for handling redirects in a search engine
    3.
    发明申请
    Method, system, and program for handling redirects in a search engine 有权
    用于在搜索引擎中处理重定向的方法,系统和程序

    公开(公告)号:US20050165800A1

    公开(公告)日:2005-07-28

    申请号:US10764771

    申请日:2004-01-26

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30882 G06F17/30864

    摘要: Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.

    摘要翻译: 公开了一种用于处理文档中的重定向的方法,系统和程序。 至少有一个等价类,包括通过重定向连接的文档。 检测每个等价类的周期,其中标记周期中的文档,使得它们不被索引。 检测到每个等价类的不完整的链,其中不完整链中的文档被标记,使得它们不被索引。 选择每个等价类的代表。

    Virtual cursors for XML joins
    7.
    发明申请
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US20070112813A1

    公开(公告)日:2007-05-17

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。