发明授权
US08001128B2 Selection of a set of optimal n-grams for indexing string data in a DBMS system under space constraints introduced by the system
失效
在系统引入的空间约束下,选择一组用于在DBMS系统中索引字符串数据的最佳n-gram
- 专利标题: Selection of a set of optimal n-grams for indexing string data in a DBMS system under space constraints introduced by the system
- 专利标题(中): 在系统引入的空间约束下,选择一组用于在DBMS系统中索引字符串数据的最佳n-gram
-
申请号: US12264899申请日: 2008-11-04
-
公开(公告)号: US08001128B2公开(公告)日: 2011-08-16
- 发明人: Vahit Hakan Hacigumus , Balakrishna Raghavendra Iyer , Sharad Mehrotra
- 申请人: Vahit Hakan Hacigumus , Balakrishna Raghavendra Iyer , Sharad Mehrotra
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理机构: Sawyer Law Group, P.C.
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
The present invention provides a computer-readable medium and system for selecting a set of n-grams for indexing string data in a DBMS system. Aspects of the invention include providing a set of candidate n-grams, each n-gram comprising a sequence of characters; identifying sample queries having character strings containing the candidate n-grams; and based on the set of candidate n-grams, the sample queries, database records, and an n-gram space constraint, automatically selecting, given the space constraint, a minimal set of an n-grams from the set of candidate n-grams that minimizes the number of false hits for the set of sample queries had the sample queries been executed against the database records.