-
公开(公告)号:US20130226879A1
公开(公告)日:2013-08-29
申请号:US13434647
申请日:2012-03-29
IPC分类号: G06F17/30
CPC分类号: G06F17/30303 , G06F17/30286 , G06F21/60 , G06F21/6227 , G06F21/6245
摘要: A computer-implemented method for detecting a set of inconsistent data records in a database including multiple records, comprises selecting a data quality rule representing a functional dependency for the database, transforming the data quality rule into at least one rule vector with hashed components, selecting a set of attributes of the database, transforming at least one record of the database selected on the basis of the selected attributes into a record vector with hashed components, computing a dot product of the rule and record vectors to generate a measure representing violation of the data quality rule by the record.
摘要翻译: 一种用于在包括多个记录的数据库中检测一组不一致数据记录的计算机实现的方法,包括选择表示数据库的功能依赖性的数据质量规则,将数据质量规则转换成具有散列分量的至少一个规则向量,选择 数据库的一组属性,将基于所选择的属性选择的数据库的至少一个记录转换成具有散列分量的记录向量,计算规则的点乘积和记录向量,以生成表示违反 数据质量规则记录。
-
公开(公告)号:US20120296879A1
公开(公告)日:2012-11-22
申请号:US13110246
申请日:2011-05-18
申请人: Mohamed YAKOUT , Ahmed K. ELMAGARMID , Jennifer NEVILLE , Mourad OUZZANI , Ihab Francis llyas KALDAS
发明人: Mohamed YAKOUT , Ahmed K. ELMAGARMID , Jennifer NEVILLE , Mourad OUZZANI , Ihab Francis llyas KALDAS
IPC分类号: G06F17/30
CPC分类号: G06F17/30303
摘要: A computer implemented method for correcting records in a database comprising generating, using a processor, respective candidate replacement entries for multiple inconsistent records of the database, grouping the candidate replacement entries to provide multiple groups of related candidate updates for the database, ranking the groups according to a loss function to quantify database quality, receiving input for a selected group, sorting candidate replacement entries in the selected group, and applying updates from the selected group to the database to correct entries of the inconsistent records.
摘要翻译: 一种用于校正数据库中的记录的计算机实现的方法,包括使用处理器生成数据库的多个不一致记录的各个候选替换条目,对候选替换条目进行分组以提供数据库的多组相关候选更新, 用于量化数据库质量的损失函数,接收所选择的组的输入,在所选择的组中排序候选替换条目,以及将来自所选择的组的更新应用于数据库以校正不一致记录的条目。
-