Automatic selection of high quality training data using an adaptive oracle-trained learning framework

    公开(公告)号:US10657457B1

    公开(公告)日:2020-05-19

    申请号:US14578200

    申请日:2014-12-19

    申请人: GROUPON, INC.

    IPC分类号: G06N20/00

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.

    Dynamic Clustering For Streaming Data
    2.
    发明申请

    公开(公告)号:US20200050614A1

    公开(公告)日:2020-02-13

    申请号:US16418267

    申请日:2019-05-21

    申请人: Groupon, Inc.

    IPC分类号: G06F16/28

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.

    Automated adaptive data analysis using dynamic data quality assessment

    公开(公告)号:US10262277B2

    公开(公告)日:2019-04-16

    申请号:US15427908

    申请日:2017-02-08

    申请人: Groupon, Inc.

    IPC分类号: G06F17/30 G06N99/00

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    Automated Adaptive Data Analysis Using Dynamic Data Quality Assessment

    公开(公告)号:US20190258954A1

    公开(公告)日:2019-08-22

    申请号:US16279731

    申请日:2019-02-19

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    DYNAMIC CLUSTERING FOR STREAMING DATA

    公开(公告)号:US20170124178A1

    公开(公告)日:2017-05-04

    申请号:US15259630

    申请日:2016-09-08

    申请人: Groupon, Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30598 G06F17/30592

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.

    SYSTEMS, APPARATUS, AND METHODS OF PROGRAMMATICALLY DETERMINING UNIQUE CONTACTS

    公开(公告)号:US20220318826A1

    公开(公告)日:2022-10-06

    申请号:US17648504

    申请日:2022-01-20

    申请人: GROUPON, INC.

    摘要: Systems, apparatus, and methods for determining unique contacts from a collection or pool of merchant data are discussed herein. Some embodiments may provide for an apparatus including circuitry configured to: access first merchant data associated with a first merchant; access second merchant data associated with a second merchant; determine a match score based the first merchant data and the second merchant data indicating a likelihood of the first merchant being the same as the second merchant; determine a match score threshold; determine whether the match score exceeds the match score threshold; and in response determining the match score fails to exceed the match score threshold, determine the first merchant as being different from the second merchant. Some embodiments may provide for techniques for machine learning with merchant data training sets to determine match scores.

    AUTOMATED DYNAMIC DATA QUALITY ASSESSMENT

    公开(公告)号:US20220300828A1

    公开(公告)日:2022-09-22

    申请号:US17684935

    申请日:2022-03-02

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    Systems, apparatus, and methods of programmatically determining unique contacts based on crowdsourced error correction

    公开(公告)号:US10410225B1

    公开(公告)日:2019-09-10

    申请号:US14788488

    申请日:2015-06-30

    申请人: Groupon, Inc.

    摘要: Systems, apparatus, and methods for determining unique contacts from a collection or pool of merchant data are discussed herein. Some embodiments may provide for an apparatus including circuitry configured to determine programmatic match results indicating whether different instances of merchant data match (e.g., describe the same contact). The circuitry may further determine probabilities of precision or recall errors with the programmatic match results. Programmatic match results having a high probability of error may be annotated by a user to generate user match results. The user match results may be used to generate a more reliable contacts database including unique contacts, as well as to train and/or update the match scoring algorithm. As such, the accuracy of machine-implemented binary classification is improved.

    Dynamic Clustering For Streaming Data
    9.
    发明申请

    公开(公告)号:US20180293293A1

    公开(公告)日:2018-10-11

    申请号:US15815299

    申请日:2017-11-16

    申请人: Groupon, Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F16/285 G06F16/283

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.

    Automated Dynamic Data Quality Assessment
    10.
    发明申请

    公开(公告)号:US20180150767A1

    公开(公告)日:2018-05-31

    申请号:US15619786

    申请日:2017-06-12

    申请人: Groupon, Inc.

    IPC分类号: G06N99/00 G06F17/30

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.