-
公开(公告)号:US20120130995A1
公开(公告)日:2012-05-24
申请号:US12951747
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30613
摘要: Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.
摘要翻译: 提供方法和计算机存储介质,用于为前向索引中的文档生成条目。 除了与查询无关的静态特征之外,还收到文档及其文档标识。 该文档被解析为令牌以形成与文档相对应的令牌流。 识别用于计算文档排名的相关数据,并确定数据的位置。 然后从文档标识,文档的令牌流,静态特征和相关数据的位置信息生成条目。 条目存储在转发索引中。
-
公开(公告)号:US20120130997A1
公开(公告)日:2012-05-24
申请号:US12951815
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , CHAD P. WALTERS , VISHESH PARIKH , JAN OTTO PEDERSEN
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30194 , G06F17/30442 , G06F17/3053
摘要: Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents.
摘要翻译: 提供了使用混合分发系统来基于搜索查询来识别相关文档的方法和系统。 一组文档被分配给特定的段。 文档组由原子和文档索引,以形成反向索引和前向索引。 这两个索引在该段中的每个节点之间划分,使得每个节点负责存储和访问反向索引和前向索引的不同部分。 在第一组节点中的每一个上访问反向索引部分,以标识与特定搜索查询相关的第一组文档。 与第一组文档相关联的文档标识用于标识访问其前向索引部分的第二组节点,以将相关文档的数量限制为第二组文档。
-
公开(公告)号:US20120130925A1
公开(公告)日:2012-05-24
申请号:US12951659
申请日:2010-11-22
申请人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , VISHESH PARIKH
发明人: KNUT MAGNE RISVIK , MICHAEL HOPCROFT , JOHN G. BENNETT , KARTHIK KALYANARAMAN , TRISHUL CHILIMBI , VISHESH PARIKH
CPC分类号: G06F17/3053 , G06F17/30864
摘要: Methods and computer storage media are provided for generating an algorithm used to provide preliminary rankings to candidate documents. A final ranking function that provides final rankings for documents is analyzed to identify potential preliminary ranking features, such as static ranking features that are query independent and dynamic atom-isolated components that are related to a single atom. Preliminary ranking features are selected from the potential preliminary ranking features based on many factors. Using these selected features, an algorithm is generated to provide a preliminary ranking to the candidate documents before the most relevant documents are passed to the final ranking stage.
摘要翻译: 提供了方法和计算机存储介质,用于生成用于向候选文档提供初步排名的算法。 分析提供文档最终排名的最终排名功能,以识别潜在的初步排名特征,例如与单个原子相关的独立查询和动态原子分离组件的静态排名特征。 初步排名特征是从潜在的初步排名特征中选出的,基于很多因素。 使用这些选择的特征,生成算法以在最相关文档被传递到最终排名阶段之前为候选文档提供初步排名。
-
-