Architecture for an indexer
    4.
    发明授权
    Architecture for an indexer 失效
    索引器的架构

    公开(公告)号:US07743060B2

    公开(公告)日:2010-06-22

    申请号:US11834556

    申请日:2007-08-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.

    摘要翻译: 公开了一种索引数据的技术。 对于一组文档中的每个标记,生成包括指示与排序键相关联的文档的一部分是锚定文本部分还是上下文部分的文档标识符的排序关键字,其中锚文本部分和上下文 文本部分具有相同的文档标识符; 确定与令牌相关联的数据字段是否是固定宽度; 当数据字段是固定宽度时,令牌被指定为要进行固定宽度排序的令牌; 并且当数据字段是可变长度时,令牌被指定为要对其执行可变宽度排序的令牌。 执行固定宽度排序和可变宽度排序。 对于每个文档,排序键用于将锚文本部分和文档的上下文部分组合在一起。

    Method, system, and program for handling redirects in a search engine
    5.
    发明授权
    Method, system, and program for handling redirects in a search engine 有权
    用于在搜索引擎中处理重定向的方法,系统和程序

    公开(公告)号:US08296304B2

    公开(公告)日:2012-10-23

    申请号:US10764771

    申请日:2004-01-26

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30882 G06F17/30864

    摘要: Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.

    摘要翻译: 公开了一种用于处理文档中的重定向的方法,系统和程序。 至少有一个等价类,包括通过重定向连接的文档。 检测每个等价类的周期,其中标记周期中的文档,使得它们不被索引。 检测到每个等价类的不完整的链,其中不完整链中的文档被标记,使得它们不被索引。 选择每个等价类的代表。

    Optimizing cursor movement in holistic twig joins
    6.
    发明申请
    Optimizing cursor movement in holistic twig joins 有权
    优化光标移动在整个树枝连接

    公开(公告)号:US20080010302A1

    公开(公告)日:2008-01-10

    申请号:US11475807

    申请日:2006-06-27

    IPC分类号: G06F7/00

    摘要: A holistic twig join method with optimal cursor movement is disclosed. The method in one aspect minimizes the number of cursor moves by looking more globally at the query's state to determine which cursor to move next and making virtual moves where a physical move is not needed. The method in another aspect reduces the number of cursor moves by skipping over nodes that do not need to be output.

    摘要翻译: 公开了一种具有最佳光标移动的整体树枝连接方法。 一方面的方法通过在查询的状态下更全面地查看光标移动的数量来最小化以确定哪个光标移动到下一个并且进行不需要物理移动的虚拟移动。 另一方面的方法通过跳过不需要输出的节点来减少光标移动的数量。

    Virtual cursors for XML joins
    8.
    发明授权
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US07685138B2

    公开(公告)日:2010-03-23

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。

    Index partition maintenance over monotonically addressed document sequences
    10.
    发明授权
    Index partition maintenance over monotonically addressed document sequences 有权
    索引分区维护通过单调寻址的文档序列

    公开(公告)号:US08738673B2

    公开(公告)日:2014-05-27

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。