-
公开(公告)号:US09424351B2
公开(公告)日:2016-08-23
申请号:US12951815
申请日:2010-11-22
申请人: Knut Magne Risvik , Michael Hopcroft , John Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Vishesh Parikh , Jan Otto Pedersen
发明人: Knut Magne Risvik , Michael Hopcroft , John Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Vishesh Parikh , Jan Otto Pedersen
CPC分类号: G06F17/30864 , G06F17/30194 , G06F17/30442 , G06F17/3053
摘要: Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents.
摘要翻译: 提供了使用混合分发系统来基于搜索查询来识别相关文档的方法和系统。 一组文档被分配给特定的段。 文档组由原子和文档索引,以形成反向索引和前向索引。 这两个索引在该段中的每个节点之间划分,使得每个节点负责存储和访问反向索引和前向索引的不同部分。 在第一组节点中的每一个上访问反向索引部分,以标识与特定搜索查询相关的第一组文档。 与第一组文档相关联的文档标识用于标识访问其前向索引部分的第二组节点,以将相关文档的数量限制为第二组文档。
-
公开(公告)号:US08620907B2
公开(公告)日:2013-12-31
申请号:US12951528
申请日:2010-11-22
申请人: Knut Magne Risvik , Michael Hopcroft , John G. Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Jan Otto Pedersen
发明人: Knut Magne Risvik , Michael Hopcroft , John G. Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Jan Otto Pedersen
IPC分类号: G06F7/00
CPC分类号: G06F17/30864
摘要: Search results are identified and returned in response to search queries by evaluating and pruning candidate documents in multiple stages. The process employs a search index that indexes atoms found in documents and pre-computed scores for document/atom pairs. When a search query is received, atoms are identified from the search query and a reformulated query is generated based on the identified atoms. The reformulated query is used to identify matching documents, and a preliminary score is generated for matching documents using a simplified scoring function and pre-computed scores in the search index. Documents are pruned based on preliminary scores, and the remaining documents are evaluated using a final ranking algorithm that provides a final set of ranked documents, which is used to generate search results to return in response to the search query.
摘要翻译: 搜索结果通过多个阶段评估和修剪候选文件来识别和返回以响应搜索查询。 该过程使用搜索索引来索引文档中找到的原子,并为文档/原子对预先计算分数。 当接收到搜索查询时,从搜索查询中识别原子,并根据所识别的原子生成重新排列的查询。 重新配置的查询用于识别匹配文档,并使用简单的评分函数和搜索索引中的预先计算的分数生成匹配文档的初步分数。 基于初步分数修剪文档,并且使用最终排序算法评估剩余文档,该最终排名算法提供最终的排名文档集合,其用于生成搜索结果以响应于搜索查询而返回。
-
公开(公告)号:US20120130995A1
公开(公告)日:2012-05-24
申请号:US12951747
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30613
摘要: Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.
摘要翻译: 提供方法和计算机存储介质,用于为前向索引中的文档生成条目。 除了与查询无关的静态特征之外,还收到文档及其文档标识。 该文档被解析为令牌以形成与文档相对应的令牌流。 识别用于计算文档排名的相关数据,并确定数据的位置。 然后从文档标识,文档的令牌流,静态特征和相关数据的位置信息生成条目。 条目存储在转发索引中。
-
公开(公告)号:US08713024B2
公开(公告)日:2014-04-29
申请号:US12951747
申请日:2010-11-22
申请人: Knut Magne Risvik , Michael Hopcroft , John G. Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Vishesh Parikh , Jan Otto Pedersen
发明人: Knut Magne Risvik , Michael Hopcroft , John G. Bennett , Karthik Kalyanaraman , Trishul Chilimbi , Chad P. Walters , Vishesh Parikh , Jan Otto Pedersen
CPC分类号: G06F17/30864 , G06F17/30613
摘要: Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.
摘要翻译: 提供方法和计算机存储介质,用于为前向索引中的文档生成条目。 除了与查询无关的静态特征之外,还收到文档及其文档标识。 该文档被解析为令牌以形成与文档相对应的令牌流。 识别用于计算文档排名的相关数据,并确定数据的位置。 然后从文档标识,文档的令牌流,静态特征和相关数据的位置信息生成条目。 条目存储在转发索引中。
-
公开(公告)号:US20120130997A1
公开(公告)日:2012-05-24
申请号:US12951815
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30194 , G06F17/30442 , G06F17/3053
摘要: Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents.
摘要翻译: 提供了使用混合分发系统来基于搜索查询来识别相关文档的方法和系统。 一组文档被分配给特定的段。 文档组由原子和文档索引,以形成反向索引和前向索引。 这两个索引在该段中的每个节点之间划分,使得每个节点负责存储和访问反向索引和前向索引的不同部分。 在第一组节点中的每一个上访问反向索引部分,以标识与特定搜索查询相关的第一组文档。 与第一组文档相关联的文档标识用于标识访问其前向索引部分的第二组节点,以将相关文档的数量限制为第二组文档。
-
公开(公告)号:US20120130994A1
公开(公告)日:2012-05-24
申请号:US12951528
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , JAN OTTO PEDERSEN
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , JAN OTTO PEDERSEN
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: Search results are identified and returned in response to search queries by evaluating and pruning candidate documents in multiple stages. The process employs a search index that indexes atoms found in documents and pre-computed scores for document/atom pairs. When a search query is received, atoms are identified from the search query and a reformulated query is generated based on the identified atoms. The reformulated query is used to identify matching documents, and a preliminary score is generated for matching documents using a simplified scoring function and pre-computed scores in the search index. Documents are pruned based on preliminary scores, and the remaining documents are evaluated using a final ranking algorithm that provides a final set of ranked documents, which is used to generate search results to return in response to the search query.
摘要翻译: 搜索结果通过多个阶段评估和修剪候选文件来识别和返回以响应搜索查询。 该过程使用搜索索引来索引文档中找到的原子,并为文档/原子对预先计算分数。 当接收到搜索查询时,从搜索查询中识别原子,并根据所识别的原子生成重新排列的查询。 重新配置的查询用于识别匹配文档,并使用简单的评分函数和搜索索引中的预先计算的分数生成匹配文档的初步分数。 基于初步分数修剪文档,并且使用最终排序算法评估剩余文档,该最终排名算法提供最终的排名文档集合,其用于生成搜索结果以响应于搜索查询而返回。
-
-
-
-
-