Invention Grant
- Patent Title: Multi-pass duplicate identification using sorted neighborhoods and aggregation techniques
-
Application No.: US15413144Application Date: 2017-01-23
-
Publication No.: US10409788B2Publication Date: 2019-09-10
- Inventor: Larissa Heissler , Andre Adam , Philipp Mail , Florian Hoffmann
- Applicant: SAP SE
- Applicant Address: DE Walldorf
- Assignee: SAP SE
- Current Assignee: SAP SE
- Current Assignee Address: DE Walldorf
- Agency: Jones Day
- Main IPC: G06F16/215
- IPC: G06F16/215

Abstract:
Systems and methods are provided herein for multi-pass duplicate identification using sorted neighborhoods. Data comprising a plurality of data records is received. Neighborhood records are generated by merging the plurality of data records with reference records stored in a remote data store. A resource identification field is assigned to each reference record. A pair distance, for each pair of neighborhood records having different resource identification fields, is determined by calculating a standard deviation of distances between each attribute of the pair scaled by a filled pairs quote value. Possible duplicate records are identified by evaluating each pair distance against a threshold, each possible duplicate having grouped attributes. Final duplicate records are identified by matching each group to a key.
Public/Granted literature
- US20180210903A1 Multi-Pass Duplicate Identification Using Sorted Neighborhoods and Aggregation Techniques Public/Granted day:2018-07-26
Information query