-
公开(公告)号:US20140059015A1
公开(公告)日:2014-02-27
申请号:US13593508
申请日:2012-08-23
申请人: Yaron Zinar , Efim Hudis , Yifat Orlin , Gal Novik , Yuri Gurevich , Gad Peleg
发明人: Yaron Zinar , Efim Hudis , Yifat Orlin , Gal Novik , Yuri Gurevich , Gad Peleg
IPC分类号: G06F17/30
CPC分类号: G06F17/30156 , G06F3/0641 , G06F17/30138 , G06F17/30303
摘要: The present invention extends to methods, systems, and computer program products for selecting candidate records for deduplication from a table. A table can be processed to compute an inverse index for each field of the table. A deduplication algorithm can traverse the inverse indices in accordance with a flexible user-defined policy to identify candidate records for deduplication. Both exact matches and approximate matches can be found.
-
公开(公告)号:US08719236B2
公开(公告)日:2014-05-06
申请号:US13593508
申请日:2012-08-23
申请人: Yaron Zinar , Efim Hudis , Yifat Orlin , Gal Novik , Yuri Gurevich , Gad Peleg
发明人: Yaron Zinar , Efim Hudis , Yifat Orlin , Gal Novik , Yuri Gurevich , Gad Peleg
CPC分类号: G06F17/30156 , G06F3/0641 , G06F17/30138 , G06F17/30303
摘要: The present invention extends to methods, systems, and computer program products for selecting candidate records for deduplication from a table. A table can be processed to compute an inverse index for each field of the table. A deduplication algorithm can traverse the inverse indices in accordance with a flexible user-defined policy to identify candidate records for deduplication. Both exact matches and approximate matches can be found.
摘要翻译: 本发明扩展到用于从表中选择重复数据删除的候选记录的方法,系统和计算机程序产品。 可以处理表以计算表的每个字段的反向索引。 重复数据删除算法可以根据灵活的用户定义策略遍历反向索引,以识别重复数据删除的候选记录。 可以找到精确匹配和近似匹配。
-