Architecture for an indexer
    2.
    发明授权
    Architecture for an indexer 失效
    索引器的架构

    公开(公告)号:US07743060B2

    公开(公告)日:2010-06-22

    申请号:US11834556

    申请日:2007-08-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.

    摘要翻译: 公开了一种索引数据的技术。 对于一组文档中的每个标记,生成包括指示与排序键相关联的文档的一部分是锚定文本部分还是上下文部分的文档标识符的排序关键字,其中锚文本部分和上下文 文本部分具有相同的文档标识符; 确定与令牌相关联的数据字段是否是固定宽度; 当数据字段是固定宽度时,令牌被指定为要进行固定宽度排序的令牌; 并且当数据字段是可变长度时,令牌被指定为要对其执行可变宽度排序的令牌。 执行固定宽度排序和可变宽度排序。 对于每个文档,排序键用于将锚文本部分和文档的上下文部分组合在一起。

    SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES
    3.
    发明申请
    SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES 失效
    用于搜索数值范围的文件的制造和制造

    公开(公告)号:US20080294634A1

    公开(公告)日:2008-11-27

    申请号:US12187344

    申请日:2008-08-06

    IPC分类号: G06F7/06 G06F17/30

    摘要: Provided are a system and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents include at least one value that is a member of a set of values. A number of posting lists is generated, wherein each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values. A query on a query range of values within the set of values is received and a determination is made of a minimum number of posting lists associated with consecutive values that together include the query range of values. The determined posting lists are merged to form a merged posting list including document identifiers of documents including values within the query range. The document identifiers in the merged posting list are returned.

    摘要翻译: 提供了用于搜索文件范围的数值的系统和制品。 文档的文档标识符至少包含一个值,它是一组值的成员。 生成多个发布列表,其中每个发布列表与该组值范围内的连续值的范围相关联,并且包括文档的文档标识符,其包括与发布列表相关联的连续值的范围内的至少一个值,并且其中 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。 接收关于该值集合中的值的查询范围的查询,并且确定与连续值相关联的一起包括查询范围值的连续值的最小发布列表数。 确定的发布列表被合并以形成合并的发布列表,包括包括查询范围内的值的文档的文档标识符。 返回合并发布列表中的文档标识符。

    Searching documents for ranges of numeric values
    4.
    发明授权
    Searching documents for ranges of numeric values 有权
    搜索文件范围的数值

    公开(公告)号:US08271498B2

    公开(公告)日:2012-09-18

    申请号:US12190495

    申请日:2008-08-12

    IPC分类号: G06F7/00

    摘要: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

    摘要翻译: 提供了用于搜索文件范围的数值的方法,系统和制品。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与所述值集合内的连续值的范围相关联,并且包括用于文档的文档标识符,所述文档包括与所述发布列表相关联的连续值的范围内的至少一个值,并且其中每个文档标识符与 由文件标识符标识的文档中包含的值集合。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。

    METHOD, SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES
    5.
    发明申请
    METHOD, SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES 有权
    用于搜索数值范围的文档的制造方法,系统和文章

    公开(公告)号:US20080301130A1

    公开(公告)日:2008-12-04

    申请号:US12190495

    申请日:2008-08-12

    IPC分类号: G06F17/30

    摘要: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

    摘要翻译: 提供了用于搜索文件范围的数值的方法,系统和制品。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与该组值范围内的连续值的范围相关联,并且包括用于文档的文档标识符,其包括与发布列表相关联的连续值的范围内的至少一个值,并且其中每个文档标识符与 由文件标识符标识的文档中包含的值集合。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。

    Method for searching documents for ranges of numeric values
    6.
    发明授权
    Method for searching documents for ranges of numeric values 有权
    搜索文件数值范围的方法

    公开(公告)号:US07461064B2

    公开(公告)日:2008-12-02

    申请号:US10949473

    申请日:2004-09-24

    IPC分类号: G06F7/00 G06F17/30

    摘要: Provided are a method, system, and program for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents having values within the range of consecutive values associated with the posting list. Each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored.

    摘要翻译: 提供了用于在数值范围内搜索文档的方法,系统和程序。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与该组值范围内的连续值的范围相关联,并且包括具有与发布列表相关联的连续值范围内的值的文档的文档标识符。 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 生成的发布列表被存储。

    Searching documents for ranges of numeric values
    7.
    发明授权
    Searching documents for ranges of numeric values 失效
    搜索文件范围的数值

    公开(公告)号:US08346759B2

    公开(公告)日:2013-01-01

    申请号:US12187344

    申请日:2008-08-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: Provided are a system and article of manufacture for searching documents for ranges of numeric values. A number of posting lists is generated, wherein each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored. A query on a query range of values within the set of values is received and a determination is made of a minimum number of posting lists associated with consecutive values that together include the query range of values. The determined posting lists are merged.

    摘要翻译: 提供了用于搜索文件范围的数值的系统和制品。 生成多个发布列表,其中每个发布列表与该组值范围内的连续值的范围相关联,并且包括文档的文档标识符,其包括与发布列表相关联的连续值的范围内的至少一个值,并且其中 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 生成的发布列表被存储。 接收关于该值集合中的值的查询范围的查询,并且确定与连续值相关联的一起包括查询范围值的连续值的最小发布列表数。 确定的发布列表合并。

    SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES
    8.
    发明申请
    SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES 有权
    搜索数值范围的文件

    公开(公告)号:US20120096016A1

    公开(公告)日:2012-04-19

    申请号:US13335634

    申请日:2011-12-22

    IPC分类号: G06F17/30

    摘要: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

    摘要翻译: 提供了用于搜索文件范围的数值的方法,系统和制品。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与所述值集合内的连续值的范围相关联,并且包括用于文档的文档标识符,所述文档包括与所述发布列表相关联的连续值的范围内的至少一个值,并且其中每个文档标识符与 由文件标识符标识的文档中包含的值集合。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。

    Virtual cursors for XML joins
    9.
    发明授权
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US07685138B2

    公开(公告)日:2010-03-23

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。