Column weight calculation for data deduplication

Invention Grant

US10452627B2 Column weight calculation for data deduplication 有权

Please log in to see more content

Patent Title: Column weight calculation for data deduplication
Application No.: US15171200

Application Date: 2016-06-02
Publication No.: US10452627B2

Publication Date: 2019-10-22
Inventor: Namit Kabra , Yannick Saillet
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agent Peter K. Suchecki
Main IPC: G06F7/00
IPC: G06F7/00 ; G06F16/215 ; G06F16/21 ; G06F16/174

Column weight calculation for data deduplication

Abstract:

A computer system with the capability to identify potentially duplicative records in a data set is provided. A computer may collect a data profile for the data set that provides descriptive information with regard to attributes of the data set. Based, at least in part, on the data profile, weights are determined for the attributes. As values of a data record are compared to values of the same respective attributes in other records, the overall likelihood of a match or duplicate, as indicated by the degree of similarity between values, is modified based on the determined weights associated with the respective attributes.

Public/Granted literature

US20170351717A1 COLUMN WEIGHT CALCULATION FOR DATA DEDUPLICATION Public/Granted day:2017-12-07

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）