Efficient duplicate detection for machine learning data sets

Invention Grant

US10963810B2 Efficient duplicate detection for machine learning data sets 有权

Please log in to see more content

Patent Title: Efficient duplicate detection for machine learning data sets
Application No.: US14569458

Application Date: 2014-12-12
Publication No.: US10963810B2

Publication Date: 2021-03-30
Inventor: Leo Parker Dirac , Aleksandr Mikhaylovich Ingerman
Applicant: Amazon Technologies, Inc.
Applicant Address: US NV Reno
Assignee: Amazon Technologies, Inc.
Current Assignee: Amazon Technologies, Inc.
Current Assignee Address: US NV Reno
Agency: Kowert, Hood, Munyon, Rankin & Goetzel, P.C.
Agent Robert C. Kowert
Main IPC: G06N20/00
IPC: G06N20/00

Efficient duplicate detection for machine learning data sets

Abstract:

At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service.

Public/Granted literature

US20150379430A1 EFFICIENT DUPLICATE DETECTION FOR MACHINE LEARNING DATA SETS Public/Granted day:2015-12-31

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N20/00	机器学习