Invention Grant
- Patent Title: Efficient fuzzy match for evaluating data records
- Patent Title (中): 用于评估数据记录的高效模糊匹配
-
Application No.: US10600083Application Date: 2003-06-20
-
Publication No.: US07296011B2Publication Date: 2007-11-13
- Inventor: Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
- Applicant: Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.
Public/Granted literature
- US20040260694A1 Efficient fuzzy match for evaluating data records Public/Granted day:2004-12-23
Information query