发明授权
- 专利标题: Efficient fuzzy match for evaluating data records
- 专利标题(中): 用于评估数据记录的高效模糊匹配
-
申请号: US10600083申请日: 2003-06-20
-
公开(公告)号: US07296011B2公开(公告)日: 2007-11-13
- 发明人: Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
- 申请人: Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
公开/授权文献
- US20040260694A1 Efficient fuzzy match for evaluating data records 公开/授权日:2004-12-23
信息查询