-
1.
公开(公告)号:US10409788B2
公开(公告)日:2019-09-10
申请号:US15413144
申请日:2017-01-23
Applicant: SAP SE
Inventor: Larissa Heissler , Andre Adam , Philipp Mail , Florian Hoffmann
IPC: G06F16/215
Abstract: Systems and methods are provided herein for multi-pass duplicate identification using sorted neighborhoods. Data comprising a plurality of data records is received. Neighborhood records are generated by merging the plurality of data records with reference records stored in a remote data store. A resource identification field is assigned to each reference record. A pair distance, for each pair of neighborhood records having different resource identification fields, is determined by calculating a standard deviation of distances between each attribute of the pair scaled by a filled pairs quote value. Possible duplicate records are identified by evaluating each pair distance against a threshold, each possible duplicate having grouped attributes. Final duplicate records are identified by matching each group to a key.
-
2.
公开(公告)号:US20180210903A1
公开(公告)日:2018-07-26
申请号:US15413144
申请日:2017-01-23
Applicant: SAP SE
Inventor: Larissa Heissler , Andre Adam , Philipp Mail , Florian Hoffmann
IPC: G06F17/30
CPC classification number: G06F16/215
Abstract: Systems and methods are provided herein for multi-pass duplicate identification using sorted neighborhoods. Data comprising a plurality of data records is received. Neighborhood records are generated by merging the plurality of data records with reference records stored in a remote data store. A resource identification field is assigned to each reference record. A pair distance, for each pair of neighborhood records having different resource identification fields, is determined by calculating a standard deviation of distances between each attribute of the pair scaled by a filled pairs quote value. Possible duplicate records are identified by evaluating each pair distance against a threshold, each possible duplicate having grouped attributes. Final duplicate records are identified by matching each group to a key.
-