Efficient multifaceted search in information retrieval systems
    5.
    发明授权
    Efficient multifaceted search in information retrieval systems 失效
    在信息检索系统中进行有效的多方面搜索

    公开(公告)号:US07496568B2

    公开(公告)日:2009-02-24

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    Method, system, and program for handling redirects in a search engine
    6.
    发明授权
    Method, system, and program for handling redirects in a search engine 有权
    用于在搜索引擎中处理重定向的方法,系统和程序

    公开(公告)号:US08296304B2

    公开(公告)日:2012-10-23

    申请号:US10764771

    申请日:2004-01-26

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30882 G06F17/30864

    摘要: Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.

    摘要翻译: 公开了一种用于处理文档中的重定向的方法,系统和程序。 至少有一个等价类,包括通过重定向连接的文档。 检测每个等价类的周期,其中标记周期中的文档,使得它们不被索引。 检测到每个等价类的不完整的链,其中不完整链中的文档被标记,使得它们不被索引。 选择每个等价类的代表。

    Architecture for an indexer
    7.
    发明授权
    Architecture for an indexer 失效
    索引器的架构

    公开(公告)号:US07743060B2

    公开(公告)日:2010-06-22

    申请号:US11834556

    申请日:2007-08-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.

    摘要翻译: 公开了一种索引数据的技术。 对于一组文档中的每个标记,生成包括指示与排序键相关联的文档的一部分是锚定文本部分还是上下文部分的文档标识符的排序关键字,其中锚文本部分和上下文 文本部分具有相同的文档标识符; 确定与令牌相关联的数据字段是否是固定宽度; 当数据字段是固定宽度时,令牌被指定为要进行固定宽度排序的令牌; 并且当数据字段是可变长度时,令牌被指定为要对其执行可变宽度排序的令牌。 执行固定宽度排序和可变宽度排序。 对于每个文档,排序键用于将锚文本部分和文档的上下文部分组合在一起。

    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS
    9.
    发明申请
    EFFICIENT MULTIFACETED SEARCH IN INFORMATION RETRIEVAL SYSTEMS 失效
    在信息检索系统中进行有效的多媒体搜索

    公开(公告)号:US20080133473A1

    公开(公告)日:2008-06-05

    申请号:US11564915

    申请日:2006-11-30

    IPC分类号: G06F7/06

    摘要: A method for querying multifaceted information. An inverted index is constructed to include unique indexed tokens associated with posting lists of one or more documents. An indexed token is either a facet token included in a document as an annotation or a path prefix of the facet token. The annotation indicates a path within a tree structure representing a facet that includes the document. The tree structure includes nodes representing categories of documents. Constructing the inverted index includes generating a full path token and an associated full path token posting list. A query is received that includes constraints on documents. The constraints are associated with indexed tokens and corresponding posting lists. An execution of the query includes identifying the corresponding posting lists by utilizing the constraints and the inverted index and intersecting the posting lists to obtain a query result.

    摘要翻译: 一种查询多方面信息的方法。 反向索引被构造为包括与一个或多个文档的发布列表相关联的独特的索引令牌。 索引标记是作为注释的文档中包含的构面令牌或构面令牌的路径前缀。 注释表示树结构中的路径,表示包含该文档的方面。 树结构包括表示文档类别的节点。 构造反向索引包括生成完整路径令牌和相关联的完整路径令牌发布列表。 收到包含文档约束的查询。 约束与索引标记和相应的发布列表相关联。 查询的执行包括通过利用约束和反向索引来识别对应的发布列表,并与发布列表相交以获得查询结果。

    Optimizing cursor movement in holistic twig joins
    10.
    发明申请
    Optimizing cursor movement in holistic twig joins 有权
    优化光标移动在整个树枝连接

    公开(公告)号:US20080010302A1

    公开(公告)日:2008-01-10

    申请号:US11475807

    申请日:2006-06-27

    IPC分类号: G06F7/00

    摘要: A holistic twig join method with optimal cursor movement is disclosed. The method in one aspect minimizes the number of cursor moves by looking more globally at the query's state to determine which cursor to move next and making virtual moves where a physical move is not needed. The method in another aspect reduces the number of cursor moves by skipping over nodes that do not need to be output.

    摘要翻译: 公开了一种具有最佳光标移动的整体树枝连接方法。 一方面的方法通过在查询的状态下更全面地查看光标移动的数量来最小化以确定哪个光标移动到下一个并且进行不需要物理移动的虚拟移动。 另一方面的方法通过跳过不需要输出的节点来减少光标移动的数量。