DISCOVERY OF NEW BUSINESS OPENINGS USING WEB CONTENT ANALYSIS

    公开(公告)号:US20240029086A1

    公开(公告)日:2024-01-25

    申请号:US18365775

    申请日:2023-08-04

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.

    Method, apparatus, and computer program product for classification and tagging of textual data

    公开(公告)号:US10853401B2

    公开(公告)日:2020-12-01

    申请号:US16511045

    申请日:2019-07-15

    申请人: Groupon, Inc.

    发明人: Nick Pendar

    摘要: Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.

    Automated Adaptive Data Analysis Using Dynamic Data Quality Assessment

    公开(公告)号:US20190258954A1

    公开(公告)日:2019-08-22

    申请号:US16279731

    申请日:2019-02-19

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    Method, apparatus, and computer program product for classification and tagging of textual data

    公开(公告)号:US11907277B2

    公开(公告)日:2024-02-20

    申请号:US18165156

    申请日:2023-02-06

    申请人: Groupon, Inc.

    发明人: Nick Pendar

    摘要: Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.

    Discovery of new business openings using web content analysis

    公开(公告)号:US11244328B2

    公开(公告)日:2022-02-08

    申请号:US16660517

    申请日:2019-10-22

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.

    Automatic selection of high quality training data using an adaptive oracle-trained learning framework

    公开(公告)号:US10657457B1

    公开(公告)日:2020-05-19

    申请号:US14578200

    申请日:2014-12-19

    申请人: GROUPON, INC.

    IPC分类号: G06N20/00

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.

    Automated adaptive data analysis using dynamic data quality assessment

    公开(公告)号:US10262277B2

    公开(公告)日:2019-04-16

    申请号:US15427908

    申请日:2017-02-08

    申请人: Groupon, Inc.

    IPC分类号: G06F17/30 G06N99/00

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    Multi-term query subsumption for document classification

    公开(公告)号:US09652527B2

    公开(公告)日:2017-05-16

    申请号:US15198461

    申请日:2016-06-30

    申请人: Groupon, Inc.

    发明人: Nick Pendar

    IPC分类号: G06F17/30 G06N99/00

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for for generating an optimal classifying query set for categorizing and/or labeling textual data based on a query subsumption calculus to determine, given two queries, whether one of the queries subsumes another. In one aspect, a method includes generating a group of determining queries based on analyzing text within a document; receiving a group of classifying queries; and, for each determining query within the group of determining queries, determining whether at least one of the classifying queries is subsumed by the determining query; and updating the group of classifying queries in an instance in which the classifying query is subsumed by the determining query.

    AUTOMATED ADAPTIVE DATA ANALYSIS USING DYNAMIC DATA QUALITY ASSESSMENT

    公开(公告)号:US20230009563A1

    公开(公告)日:2023-01-12

    申请号:US17842682

    申请日:2022-06-16

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.

    DISCOVERY OF NEW BUSINESS OPENINGS USING WEB CONTENT ANALYSIS

    公开(公告)号:US20220230189A1

    公开(公告)日:2022-07-21

    申请号:US17563847

    申请日:2021-12-28

    申请人: Groupon, Inc.

    摘要: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.