Efficient multifaceted search in information retrieval systems
    3.
    发明授权
    Efficient multifaceted search in information retrieval systems 失效
    在信息检索系统中进行有效的多方面搜索

    公开(公告)号:US07496568B2

    公开(公告)日:2009-02-24

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS
    4.
    发明申请
    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS 失效
    在信息检索系统中进行有效的多媒体搜索

    公开(公告)号:US20080133473A1

    公开(公告)日:2008-06-05

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/06

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    A Generic Architecture for Indexing Document Groups in an Inverted Text Index
    6.
    发明申请
    A Generic Architecture for Indexing Document Groups in an Inverted Text Index 有权
    用于在反文本索引中索引文档组的通用架构

    公开(公告)号:US20060155739A1

    公开(公告)日:2006-07-13

    申请号:US10905604

    申请日:2005-01-12

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30622

    摘要: A method for indexing a plurality of documents, that includes a plurality of duplicate documents, first identifies one or more duplicate groups of documents from among the plurality of documents. Then, one index of content for the duplicate group is created instead of indexing the content from every document within the duplicate group. However, in contrast to the content index, an index of metadata for each of the documents in the duplicate group is created. Thus the content of each duplicate group is indexed only once, while a search engine using such indexing techniques retains the capability to answer queries as if the duplicated content was indexed for each document of the group.

    摘要翻译: 一种用于索引多个文档(包括多个重复文档)的方法首先从多个文档中识别一个或多个文档重复组。 然后,创建重复组的一个内容索引,而不是从重复组中的每个文档索引内容。 然而,与内容索引相反,创建了重复组中的每个文档的元数据索引。 因此,每个重复组的内容仅被索引一次,而使用这种索引技术的搜索引擎保留回答查询的能力,就好像为组中的每个文档索引了重复的内容。

    SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES
    7.
    发明申请
    SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES 失效
    用于搜索数值范围的文件的制造和制造

    公开(公告)号:US20080294634A1

    公开(公告)日:2008-11-27

    申请号:US12187344

    申请日:2008-08-06

    IPC分类号: G06F7/06 G06F17/30

    摘要: Provided are a system and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents include at least one value that is a member of a set of values. A number of posting lists is generated, wherein each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values. A query on a query range of values within the set of values is received and a determination is made of a minimum number of posting lists associated with consecutive values that together include the query range of values. The determined posting lists are merged to form a merged posting list including document identifiers of documents including values within the query range. The document identifiers in the merged posting list are returned.

    摘要翻译: 提供了用于搜索文件范围的数值的系统和制品。 文档的文档标识符至少包含一个值,它是一组值的成员。 生成多个发布列表,其中每个发布列表与该组值范围内的连续值的范围相关联,并且包括文档的文档标识符,其包括与发布列表相关联的连续值的范围内的至少一个值,并且其中 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。 接收关于该值集合中的值的查询范围的查询,并且确定与连续值相关联的一起包括查询范围值的连续值的最小发布列表数。 确定的发布列表被合并以形成合并的发布列表,包括包括查询范围内的值的文档的文档标识符。 返回合并发布列表中的文档标识符。

    Searching documents for ranges of numeric values
    8.
    发明授权
    Searching documents for ranges of numeric values 有权
    搜索文件范围的数值

    公开(公告)号:US08271498B2

    公开(公告)日:2012-09-18

    申请号:US12190495

    申请日:2008-08-12

    IPC分类号: G06F7/00

    摘要: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

    摘要翻译: 提供了用于搜索文件范围的数值的方法,系统和制品。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与所述值集合内的连续值的范围相关联,并且包括用于文档的文档标识符,所述文档包括与所述发布列表相关联的连续值的范围内的至少一个值,并且其中每个文档标识符与 由文件标识符标识的文档中包含的值集合。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。

    METHOD, SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES
    9.
    发明申请
    METHOD, SYSTEM AND ARTICLE OF MANUFACTURE FOR SEARCHING DOCUMENTS FOR RANGES OF NUMERIC VALUES 有权
    用于搜索数值范围的文档的制造方法,系统和文章

    公开(公告)号:US20080301130A1

    公开(公告)日:2008-12-04

    申请号:US12190495

    申请日:2008-08-12

    IPC分类号: G06F17/30

    摘要: Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.

    摘要翻译: 提供了用于搜索文件范围的数值的方法,系统和制品。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与该组值范围内的连续值的范围相关联,并且包括用于文档的文档标识符,其包括与发布列表相关联的连续值的范围内的至少一个值,并且其中每个文档标识符与 由文件标识符标识的文档中包含的值集合。 存储生成的发布列表,其中发布列表用于处理在该组值范围内的查询。

    Method for searching documents for ranges of numeric values
    10.
    发明授权
    Method for searching documents for ranges of numeric values 有权
    搜索文件数值范围的方法

    公开(公告)号:US07461064B2

    公开(公告)日:2008-12-02

    申请号:US10949473

    申请日:2004-09-24

    IPC分类号: G06F7/00 G06F17/30

    摘要: Provided are a method, system, and program for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents having values within the range of consecutive values associated with the posting list. Each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored.

    摘要翻译: 提供了用于在数值范围内搜索文档的方法,系统和程序。 访问文档的文档标识符,其中文档包括作为一组值的成员的至少一个值。 生成多个发布列表。 每个发布列表与该组值范围内的连续值的范围相关联,并且包括具有与发布列表相关联的连续值范围内的值的文档的文档标识符。 每个文档标识符与由文档标识符标识的文档中包括的值集合中的一个值相关联。 生成的发布列表被存储。