Selecting candidate rows for deduplication
    2.
    发明授权
    Selecting candidate rows for deduplication 有权
    选择重复数据删除的候选行

    公开(公告)号:US08719236B2

    公开(公告)日:2014-05-06

    申请号:US13593508

    申请日:2012-08-23

    IPC分类号: G06F17/30 G06F3/06

    摘要: The present invention extends to methods, systems, and computer program products for selecting candidate records for deduplication from a table. A table can be processed to compute an inverse index for each field of the table. A deduplication algorithm can traverse the inverse indices in accordance with a flexible user-defined policy to identify candidate records for deduplication. Both exact matches and approximate matches can be found.

    摘要翻译: 本发明扩展到用于从表中选择重复数据删除的候选记录的方法,系统和计算机程序产品。 可以处理表以计算表的每个字段的反向索引。 重复数据删除算法可以根据灵活的用户定义策略遍历反向索引,以识别重复数据删除的候选记录。 可以找到精确匹配和近似匹配。