Efficient retrieval algorithm by query term discrimination

发明授权

US07822752B2 Efficient retrieval algorithm by query term discrimination 有权

标题翻译：通过查询词辨别的有效检索算法

请登陆查看更多内容

专利标题： Efficient retrieval algorithm by query term discrimination
专利标题（中）： 通过查询词辨别的有效检索算法
申请号： US11804627

申请日： 2007-05-18
公开(公告)号： US07822752B2

公开(公告)日： 2010-10-26
发明人: Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
申请人： Chenxi Lin , Lei Ji , Huajun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
申请人地址： US WA Redmond
专利权人： Microsoft Corporation
当前专利权人： Microsoft Corporation
当前专利权人地址： US WA Redmond
主分类号： G06F7/00
IPC分类号： G06F7/00 ; G06F17/30

Efficient retrieval algorithm by query term discrimination

摘要：

Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.

摘要（中）：

描述了一种有效的检索机制，其基于查询词辨别快速定位文档（例如，对应于在线广告）。根据其排序的重要性来选择搜索项的最顶层子集（例如，两个），例如按照倒排的文档频率排序。然后使用最上面的术语来缩小被搜索以查找文档标识符和相关分数的反向查询索引的行数，例如通过BM25算法离线计算。例如，对于每个重要术语的每个文档标识符，可以通过比较文档标识符来跳过每个其他行中的指针，然后是一个指针，来执行每个狭窄的行子集（也包含该文档标识符）的快速搜索二进制搜索查找特定文档。然后可以使用该组特定文件的分数来排列其作为结果返回的相对重要性。

公开/授权文献

US20080288483A1 Efficient retrieval algorithm by query term discrimination 公开/授权日：2008-11-20

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）