Efficient fuzzy match for evaluating data records

发明授权

US07296011B2 Efficient fuzzy match for evaluating data records 有权

标题翻译：用于评估数据记录的高效模糊匹配

请登陆查看更多内容

专利标题： Efficient fuzzy match for evaluating data records
专利标题（中）： 用于评估数据记录的高效模糊匹配
申请号： US10600083

申请日： 2003-06-20
公开(公告)号： US07296011B2

公开(公告)日： 2007-11-13
发明人: Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
申请人： Surajit Chaudhuri , Kris Ganjam , Venkatesh Ganti , Rajeev Motwani
申请人地址： US WA Redmond
专利权人： Microsoft Corporation
当前专利权人： Microsoft Corporation
当前专利权人地址： US WA Redmond
主分类号： G06F7/00
IPC分类号： G06F7/00 ; G06F17/30

Efficient fuzzy match for evaluating data records

摘要：

To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

摘要（中）：

为了帮助确保高数据质量，数据仓库验证和清理，如果需要外部来源的传入数据元组。在许多情况下，输入元组或输入元组的一部分必须匹配参考表中可接受的元组。例如，分销商的销售记录中的产品名称和描述字段必须与产品参考关系中的预先记录的名称和描述字段相匹配。所公开的系统实现有效和准确的近似或模糊匹配操作，其可以有效地清除传入元组，如果它不能与参考关系中的任何多个元组完全匹配。使用称为q-gram的令牌子串的公开的相似度函数克服了现有技术相似度功能的限制，同时有效地执行模糊匹配过程。

公开/授权文献

US20040260694A1 Efficient fuzzy match for evaluating data records 公开/授权日：2004-12-23

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）