-
1.
公开(公告)号:US20090083214A1
公开(公告)日:2009-03-26
申请号:US11858920
申请日:2007-09-21
申请人: Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui
发明人: Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui
IPC分类号: G06F17/30
CPC分类号: G06F16/3331 , G06F16/313
摘要: Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold Δ. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of Δ (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.
摘要翻译: 索引结构和查询处理框架,其对计算关键词查询的开销执行给定的阈值。 这包括关键字处理算法,确定要实现哪些索引的逻辑,以及减少用于确定构建哪些索引的开销的概率方法。 指数结构利用了自然语言文本的频率分布遵循幂律的事实。 给定文档集合,提出了一组索引用于实现,以便关键字相交的时间不超过给定的阈值Delta。 在考虑相关空间需求时,附加指标有限。 对于合理的Delta值(例如,扫描20%的最大倒排指数所需的时间),至少对于短文件的收集,这种一组索引的实现是通过权力法分配的。