Invention Grant
- Patent Title: Efficient duplicate detection for machine learning data sets
-
Application No.: US14569458Application Date: 2014-12-12
-
Publication No.: US10963810B2Publication Date: 2021-03-30
- Inventor: Leo Parker Dirac , Aleksandr Mikhaylovich Ingerman
- Applicant: Amazon Technologies, Inc.
- Applicant Address: US NV Reno
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US NV Reno
- Agency: Kowert, Hood, Munyon, Rankin & Goetzel, P.C.
- Agent Robert C. Kowert
- Main IPC: G06N20/00
- IPC: G06N20/00

Abstract:
At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service.
Public/Granted literature
- US20150379430A1 EFFICIENT DUPLICATE DETECTION FOR MACHINE LEARNING DATA SETS Public/Granted day:2015-12-31
Information query