-
公开(公告)号:US20240029086A1
公开(公告)日:2024-01-25
申请号:US18365775
申请日:2023-08-04
申请人: Groupon, Inc.
IPC分类号: G06Q30/0201 , G06F16/951 , G06Q10/0637
CPC分类号: G06Q30/0201 , G06F16/951 , G06Q10/0637
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.
-
2.
公开(公告)号:US10853401B2
公开(公告)日:2020-12-01
申请号:US16511045
申请日:2019-07-15
申请人: Groupon, Inc.
发明人: Nick Pendar
IPC分类号: G06F16/30 , G06F16/35 , G06F16/93 , G06F16/31 , G06F16/338
摘要: Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.
-
公开(公告)号:US20190258954A1
公开(公告)日:2019-08-22
申请号:US16279731
申请日:2019-02-19
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC分类号: G06N20/00 , G06F16/23 , G06F16/215
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
4.
公开(公告)号:US11907277B2
公开(公告)日:2024-02-20
申请号:US18165156
申请日:2023-02-06
申请人: Groupon, Inc.
发明人: Nick Pendar
IPC分类号: G06F16/30 , G06F16/35 , G06F16/93 , G06F16/31 , G06F16/338
CPC分类号: G06F16/35 , G06F16/328 , G06F16/338 , G06F16/355 , G06F16/93
摘要: Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.
-
公开(公告)号:US11244328B2
公开(公告)日:2022-02-08
申请号:US16660517
申请日:2019-10-22
申请人: Groupon, Inc.
IPC分类号: G06Q30/02 , G06F16/951 , G06Q10/06
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.
-
公开(公告)号:US10657457B1
公开(公告)日:2020-05-19
申请号:US14578200
申请日:2014-12-19
申请人: GROUPON, INC.
IPC分类号: G06N20/00
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.
-
公开(公告)号:US10262277B2
公开(公告)日:2019-04-16
申请号:US15427908
申请日:2017-02-08
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US09652527B2
公开(公告)日:2017-05-16
申请号:US15198461
申请日:2016-06-30
申请人: Groupon, Inc.
发明人: Nick Pendar
CPC分类号: G06F17/3064 , G06F17/30011 , G06F17/30705 , G06N99/005
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for for generating an optimal classifying query set for categorizing and/or labeling textual data based on a query subsumption calculus to determine, given two queries, whether one of the queries subsumes another. In one aspect, a method includes generating a group of determining queries based on analyzing text within a document; receiving a group of classifying queries; and, for each determining query within the group of determining queries, determining whether at least one of the classifying queries is subsumed by the determining query; and updating the group of classifying queries in an instance in which the classifying query is subsumed by the determining query.
-
公开(公告)号:US20230009563A1
公开(公告)日:2023-01-12
申请号:US17842682
申请日:2022-06-16
申请人: Groupon, Inc.
发明人: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC分类号: G06N5/02 , G06N20/00 , G06F16/215 , G06F16/23
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US20220230189A1
公开(公告)日:2022-07-21
申请号:US17563847
申请日:2021-12-28
申请人: Groupon, Inc.
IPC分类号: G06Q30/02 , G06F16/951 , G06Q10/06
摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.
-
-
-
-
-
-
-
-
-