System and method for ranking logical directories
    1.
    发明申请
    System and method for ranking logical directories 失效
    用于对逻辑目录进行排序的系统和方法

    公开(公告)号:US20050256887A1

    公开(公告)日:2005-11-17

    申请号:US10847143

    申请日:2004-05-15

    IPC分类号: G06F7/00 G06F17/30

    摘要: A logical directory ranking system ranks documents or web pages utilizing logical directories. From the hierarchical structure represented in a URL string, URLs can often be grouped into “compound documents” that represent a single unit of information. Such compound documents tend to comprise URLs that agree up to a last delimiter such as a forward slash (/). The present system groups together compound documents as a single information node with one or more leaves, constructing a logical directory graph. URLs can be grouped at a level of granularity below an individual directory. For example, the URLs may be grouped together on the basis of hostname, domain, or any level of the hierarchy of the URLs. Edges in the logical directory graph are formed by links between the logical directories. Edges have weights corresponding to the number of links between logical directories. Nodes have weights corresponding to the number of web pages or leaves represented by a node. A ranking level is determined for each node as a function of the node weight and the edge weight. The ranking level is then applied to each URL that the node represents.

    摘要翻译: 逻辑目录排名系统使用逻辑目录对文档或网页进行排序。 从URL字符串中表示的层次结构中,URL通常可以分组成代表单个信息单元的“复合文档”。 这样的复合文档往往包含与最后一个分隔符一致的URL,例如正斜杠(/)。 本系统将复合文档分组为具有一个或多个叶子的单个信​​息节点,构建逻辑目录图。 URL可以按照单个目录下的粒度级别进行分组。 例如,可以根据主机名,域或URL的层次结构的任何级别将URL分组在一起。 逻辑目录图中的边缘由逻辑目录之间的链接形成。 边缘具有与逻辑目录之间的链接数相对应的权重。 节点具有对应于由节点表示的网页或叶的数量的权重。 根据节点权重和边缘权重,确定每个节点的排名级别。 然后将排名级别应用于节点所表示的每个URL。

    System, method, and computer program product for identifying multi-page documents in hypertext collections
    2.
    发明申请
    System, method, and computer program product for identifying multi-page documents in hypertext collections 审中-公开
    用于识别超文本集合中的多页文档的系统,方法和计算机程序产品

    公开(公告)号:US20050071310A1

    公开(公告)日:2005-03-31

    申请号:US10676918

    申请日:2003-09-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F16/24566

    摘要: A system, method, and computer program product for identifying compound documents as a coherent body of hyperlinked material on a single topic as created by an author or collaborating authors, analyzing the content and structure of the compound documents and related hyperlinks, and responsively selecting a preferred entry point at which to begin processing such documents. The body of material may include the internet, an intranet, or other digital library that typically has content distributed over several separate pages or URLs, sometimes in a hierarchical directory structure. The processing may include creating at least one taxonomy, as well as searching or indexing the compound documents. The identification and analysis schemes include a observation of a number of heuristics run on component documents in the compound documents.

    摘要翻译: 一种系统,方法和计算机程序产品,用于将复合文档识别为由作者或协作作者创建的单个主题上的连贯体的超链接材料,分析复合文档和相关超链接的内容和结构,以及响应性地选择 首选切入点,开始处理这些文件。 材料的主体可以包括互联网,内联网或其他通常具有分布在几个单独页面或URL上的内容的数字图书馆,有时在分层目录结构中。 该处理可以包括创建至少一个分类,以及搜索或索引复合文档。 识别和分析方案包括观察复合文件中组件文件上运行的一些启发式方法。

    Index server architecture using tiered and sharded phrase posting lists
    3.
    发明授权
    Index server architecture using tiered and sharded phrase posting lists 有权
    索引服务器架构使用分层和分层的短语发布列表

    公开(公告)号:US08682901B1

    公开(公告)日:2014-03-25

    申请号:US13332278

    申请日:2011-12-20

    IPC分类号: G01F7/00

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

    摘要翻译: 信息检索系统使用短语来索引,检索,组织和描述文档。 短语从文档集中提取。 文件根据所包含的短语索引,使用短语发布列表。 短语发布列表存储在索引服务器的集群中。 短语列表可以分组成分组,并分成分区。 查询中的短语是根据可能的短语来确定的。 从短语中创建基于短语的查询调度,然后进行优化,以减少查询处理和通信成本。 管理查询调度的执行以进一步减少或消除索引服务器中的各个查询处理操作。

    Index server architecture using tiered and sharded phrase posting lists
    4.
    发明授权
    Index server architecture using tiered and sharded phrase posting lists 有权
    索引服务器架构使用分层和分层的短语发布列表

    公开(公告)号:US07693813B1

    公开(公告)日:2010-04-06

    申请号:US11694780

    申请日:2007-03-30

    IPC分类号: G06F17/30

    摘要: An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

    摘要翻译: 信息检索系统使用短语来索引,检索,组织和描述文档。 短语从文档集中提取。 文件根据所包含的短语索引,使用短语发布列表。 短语发布列表存储在索引服务器的集群中。 短语列表可以分组成分组,并分成分区。 查询中的短语是根据可能的短语来确定的。 从短语中创建基于短语的查询调度,然后进行优化,以减少查询处理和通信成本。 管理查询调度的执行以进一步减少或消除索引服务器中的各个查询处理操作。

    System and method and computer program product for ranking logical directories
    5.
    发明授权
    System and method and computer program product for ranking logical directories 失效
    用于对逻辑目录进行排序的系统和方法以及计算机程序产品

    公开(公告)号:US07464076B2

    公开(公告)日:2008-12-09

    申请号:US10847143

    申请日:2004-05-15

    IPC分类号: G06F17/30

    摘要: A logical directory ranking system ranks documents or web pages utilizing logical directories. The present system groups together compound documents as a single information node with one or more leaves, constructing a logical directory graph. URLs can be grouped at a level of granularity below an individual directory. For example, the URLs may be grouped together on the basis of hostname, domain, or any level of the hierarchy of the URLs. Edges in the logical directory graph are formed by links between the logical directories. Edges have weights corresponding to the number of links between logical directories. Nodes have weights corresponding to the number of web pages or leaves represented by a node. A ranking level is determined for each node as a function of the node weight and the edge weight. The ranking level is then applied to each URL that the node represents.

    摘要翻译: 逻辑目录排名系统使用逻辑目录对文档或网页进行排序。 本系统将复合文档分组为具有一个或多个叶子的单个信​​息节点,构建逻辑目录图。 URL可以按照单个目录下的粒度级别进行分组。 例如,可以根据主机名,域或URL的层次结构的任何级别将URL分组在一起。 逻辑目录图中的边缘由逻辑目录之间的链接形成。 边缘具有与逻辑目录之间的链接数相对应的权重。 节点具有对应于由节点表示的网页或叶的数量的权重。 根据节点权重和边缘权重,确定每个节点的排名级别。 然后将排名级别应用于节点所表示的每个URL。

    Method and framework to support indexing and searching taxonomies in large scale full text indexes
    8.
    发明申请
    Method and framework to support indexing and searching taxonomies in large scale full text indexes 有权
    支持大规模全文索引分类和搜索索引的方法和框架

    公开(公告)号:US20070078880A1

    公开(公告)日:2007-04-05

    申请号:US11241687

    申请日:2005-09-30

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30734

    摘要: A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.

    摘要翻译: 一种对位于分类法中的多个实体进行索引的系统和方法,所述实体包括术语集合,包括在索引结构中接收术语; 为一个实体建立关于定义与各个条款相关联的实体和数据的术语集的位置的实体的发布列表; 并且在包括在每个位置处的相应实体的名称的组的数据的实体的位置处索引包括在该组内的实体的组的名称。 发布列表的构建包括将术语的位置和与该术语相关联的数据存储在该术语的发布列表中的条目中。 该方法包括对包括该术语的组的名称的别名进行索引,并使用反向列表索引将数据与索引项的每次出现相关联。