Hybrid-distribution model for search engine indexes
    1.
    发明授权
    Hybrid-distribution model for search engine indexes 有权
    搜索引擎索引的混合分布模型

    公开(公告)号:US09424351B2

    公开(公告)日:2016-08-23

    申请号:US12951815

    申请日:2010-11-22

    IPC分类号: G06F7/00 G06F17/30

    摘要: Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents.

    摘要翻译: 提供了使用混合分发系统来基于搜索查询来识别相关文档的方法和系统。 一组文档被分配给特定的段。 文档组由原子和文档索引,以形成反向索引和前向索引。 这两个索引在该段中的每个节点之间划分,使得每个节点负责存储和访问反向索引和前向索引的不同部分。 在第一组节点中的每一个上访问反向索引部分,以标识与特定搜索查询相关的第一组文档。 与第一组文档相关联的文档标识用于标识访问其前向索引部分的第二组节点,以将相关文档的数量限制为第二组文档。

    Matching funnel for large document index
    2.
    发明授权
    Matching funnel for large document index 有权
    匹配漏斗用于大型文件索引

    公开(公告)号:US08620907B2

    公开(公告)日:2013-12-31

    申请号:US12951528

    申请日:2010-11-22

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864

    摘要: Search results are identified and returned in response to search queries by evaluating and pruning candidate documents in multiple stages. The process employs a search index that indexes atoms found in documents and pre-computed scores for document/atom pairs. When a search query is received, atoms are identified from the search query and a reformulated query is generated based on the identified atoms. The reformulated query is used to identify matching documents, and a preliminary score is generated for matching documents using a simplified scoring function and pre-computed scores in the search index. Documents are pruned based on preliminary scores, and the remaining documents are evaluated using a final ranking algorithm that provides a final set of ranked documents, which is used to generate search results to return in response to the search query.

    摘要翻译: 搜索结果通过多个阶段评估和修剪候选文件来识别和返回以响应搜索查询。 该过程使用搜索索引来索引文档中找到的原子,并为文档/原子对预先计算分数。 当接收到搜索查询时,从搜索查询中识别原子,并根据所识别的原子生成重新排列的查询。 重新配置的查询用于识别匹配文档,并使用简单的评分函数和搜索索引中的预先计算的分数生成匹配文档的初步分数。 基于初步分数修剪文档,并且使用最终排序算法评估剩余文档,该最终排名算法提供最终的排名文档集合,其用于生成搜索结果以响应于搜索查询而返回。

    HYBRID-DISTRIBUTION MODEL FOR SEARCH ENGINE INDEXES
    5.
    发明申请
    HYBRID-DISTRIBUTION MODEL FOR SEARCH ENGINE INDEXES 有权
    用于搜索引擎索引的混合分布模型

    公开(公告)号:US20120130997A1

    公开(公告)日:2012-05-24

    申请号:US12951815

    申请日:2010-11-22

    IPC分类号: G06F17/30

    摘要: Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents.

    摘要翻译: 提供了使用混合分发系统来基于搜索查询来识别相关文档的方法和系统。 一组文档被分配给特定的段。 文档组由原子和文档索引,以形成反向索引和前向索引。 这两个索引在该段中的每个节点之间划分,使得每个节点负责存储和访问反向索引和前向索引的不同部分。 在第一组节点中的每一个上访问反向索引部分,以标识与特定搜索查询相关的第一组文档。 与第一组文档相关联的文档标识用于标识访问其前向索引部分的第二组节点,以将相关文档的数量限制为第二组文档。

    MATCHING FUNNEL FOR LARGE DOCUMENT INDEX
    6.
    发明申请
    MATCHING FUNNEL FOR LARGE DOCUMENT INDEX 有权
    匹配大型文件索引

    公开(公告)号:US20120130994A1

    公开(公告)日:2012-05-24

    申请号:US12951528

    申请日:2010-11-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Search results are identified and returned in response to search queries by evaluating and pruning candidate documents in multiple stages. The process employs a search index that indexes atoms found in documents and pre-computed scores for document/atom pairs. When a search query is received, atoms are identified from the search query and a reformulated query is generated based on the identified atoms. The reformulated query is used to identify matching documents, and a preliminary score is generated for matching documents using a simplified scoring function and pre-computed scores in the search index. Documents are pruned based on preliminary scores, and the remaining documents are evaluated using a final ranking algorithm that provides a final set of ranked documents, which is used to generate search results to return in response to the search query.

    摘要翻译: 搜索结果通过多个阶段评估和修剪候选文件来识别和返回以响应搜索查询。 该过程使用搜索索引来索引文档中找到的原子,并为文档/原子对预先计算分数。 当接收到搜索查询时,从搜索查询中识别原子,并根据所识别的原子生成重新排列的查询。 重新配置的查询用于识别匹配文档,并使用简单的评分函数和搜索索引中的预先计算的分数生成匹配文档的初步分数。 基于初步分数修剪文档,并且使用最终排序算法评估剩余文档,该最终排名算法提供最终的排名文档集合,其用于生成搜索结果以响应于搜索查询而返回。