-
公开(公告)号:US10657457B1
公开(公告)日:2020-05-19
申请号:US14578200
申请日:2014-12-19
申请人: GROUPON, INC.
IPC分类号: G06N20/00
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.
-
公开(公告)号:US20200050614A1
公开(公告)日:2020-02-13
申请号:US16418267
申请日:2019-05-21
申请人: Groupon, Inc.
发明人: Matthew DeLand , Chandler J. Iyer
IPC分类号: G06F16/28
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.
-
公开(公告)号:US10262277B2
公开(公告)日:2019-04-16
申请号:US15427908
申请日:2017-02-08
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US20190258954A1
公开(公告)日:2019-08-22
申请号:US16279731
申请日:2019-02-19
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC分类号: G06N20/00 , G06F16/23 , G06F16/215
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US20170124178A1
公开(公告)日:2017-05-04
申请号:US15259630
申请日:2016-09-08
申请人: Groupon, Inc.
发明人: Matthew DeLand , Chander Iyer
IPC分类号: G06F17/30
CPC分类号: G06F17/30598 , G06F17/30592
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.
-
公开(公告)号:US20220318826A1
公开(公告)日:2022-10-06
申请号:US17648504
申请日:2022-01-20
申请人: GROUPON, INC.
IPC分类号: G06Q30/02 , G06F16/2457 , G06F16/215 , G06N20/00
摘要: Systems, apparatus, and methods for determining unique contacts from a collection or pool of merchant data are discussed herein. Some embodiments may provide for an apparatus including circuitry configured to: access first merchant data associated with a first merchant; access second merchant data associated with a second merchant; determine a match score based the first merchant data and the second merchant data indicating a likelihood of the first merchant being the same as the second merchant; determine a match score threshold; determine whether the match score exceeds the match score threshold; and in response determining the match score fails to exceed the match score threshold, determine the first merchant as being different from the second merchant. Some embodiments may provide for techniques for machine learning with merchant data training sets to determine match scores.
-
公开(公告)号:US20220300828A1
公开(公告)日:2022-09-22
申请号:US17684935
申请日:2022-03-02
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC分类号: G06N5/02 , G06N20/00 , G06F16/215 , G06F16/23
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US10410225B1
公开(公告)日:2019-09-10
申请号:US14788488
申请日:2015-06-30
申请人: Groupon, Inc.
IPC分类号: G06Q30/00 , G06F16/00 , G06Q30/02 , G06F16/2455
摘要: Systems, apparatus, and methods for determining unique contacts from a collection or pool of merchant data are discussed herein. Some embodiments may provide for an apparatus including circuitry configured to determine programmatic match results indicating whether different instances of merchant data match (e.g., describe the same contact). The circuitry may further determine probabilities of precision or recall errors with the programmatic match results. Programmatic match results having a high probability of error may be annotated by a user to generate user match results. The user match results may be used to generate a more reliable contacts database including unique contacts, as well as to train and/or update the match scoring algorithm. As such, the accuracy of machine-implemented binary classification is improved.
-
公开(公告)号:US20180293293A1
公开(公告)日:2018-10-11
申请号:US15815299
申请日:2017-11-16
申请人: Groupon, Inc.
发明人: Matthew DeLand , Chander Iyer
IPC分类号: G06F17/30
CPC分类号: G06F16/285 , G06F16/283
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for modeling multi-dimensional, dynamically evolving data using dynamic clustering. In one aspect, a method includes receiving a core group of clusters of objects, each object being represented by a corresponding instance of a multi-dimensional feature vector including a dimension k; receiving a stream of data points representing a group of objects, each data point respectively representing an instance of dimension k describing a feature of an object within the group of objects; and, for each data point, adding an object described by the data point to a first cluster of objects within the core group of clusters; updating properties of the first cluster of objects in response to adding the object; and determining whether to update the core group of clusters using the updated properties of the first cluster of objects.
-
公开(公告)号:US20180150767A1
公开(公告)日:2018-05-31
申请号:US15619786
申请日:2017-06-12
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
CPC分类号: G06N20/00 , G06F16/215 , G06F16/2358 , G06F16/2365
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
-
-
-
-
-
-
-
-