Patent search ap:("SAP SE") AND inv:"Philipp Mail" Page 1

1.

发明申请
Multi-Pass Duplicate Identification Using Sorted Neighborhoods and Aggregation Techniques 审中-公开

公开(公告)号：US20180210903A1

公开(公告)日：2018-07-26

申请号：US15413144

申请日：2017-01-23

Applicant: SAP SE

Inventor： Larissa Heissler , Andre Adam , Philipp Mail , Florian Hoffmann

IPC: G06F17/30

CPC classification number: G06F16/215

Abstract: Systems and methods are provided herein for multi-pass duplicate identification using sorted neighborhoods. Data comprising a plurality of data records is received. Neighborhood records are generated by merging the plurality of data records with reference records stored in a remote data store. A resource identification field is assigned to each reference record. A pair distance, for each pair of neighborhood records having different resource identification fields, is determined by calculating a standard deviation of distances between each attribute of the pair scaled by a filled pairs quote value. Possible duplicate records are identified by evaluating each pair distance against a threshold, each possible duplicate having grouped attributes. Final duplicate records are identified by matching each group to a key.

2.

发明授权
Multi-pass duplicate identification using sorted neighborhoods and aggregation techniques 有权

公开(公告)号：US10409788B2

公开(公告)日：2019-09-10

申请号：US15413144

申请日：2017-01-23

Applicant: SAP SE

Inventor： Larissa Heissler , Andre Adam , Philipp Mail , Florian Hoffmann

IPC: G06F16/215

Abstract: Systems and methods are provided herein for multi-pass duplicate identification using sorted neighborhoods. Data comprising a plurality of data records is received. Neighborhood records are generated by merging the plurality of data records with reference records stored in a remote data store. A resource identification field is assigned to each reference record. A pair distance, for each pair of neighborhood records having different resource identification fields, is determined by calculating a standard deviation of distances between each attribute of the pair scaled by a filled pairs quote value. Possible duplicate records are identified by evaluating each pair distance against a threshold, each possible duplicate having grouped attributes. Final duplicate records are identified by matching each group to a key.

Patent Agency Ranking