-
公开(公告)号:US12045732B2
公开(公告)日:2024-07-23
申请号:US17684935
申请日:2022-03-02
Applicant: ByteDance Inc.
Inventor: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC: G06F16/00 , G06F16/215 , G06F16/23 , G06N5/02 , G06N20/00
CPC classification number: G06N5/02 , G06F16/215 , G06F16/2358 , G06F16/2365 , G06N20/00
Abstract: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
公开(公告)号:US12175483B2
公开(公告)日:2024-12-24
申请号:US18365775
申请日:2023-08-04
Applicant: Bytedance Inc.
Inventor: Shawn Ryan Jeffery , Nick Pendar , Richard Clark Barber
IPC: G06Q30/0201 , G06F16/951 , G06Q10/0637
Abstract: In general, embodiments of the present invention provide systems, methods and computer readable media for identifying a new business based on programmatically analyzing content received from online sources and, as a result, discovering one or more references to the business. In embodiments, the system stores historical data representing previously identified new businesses and then uses attributes of those businesses in search queries to receive related content. Additionally or alternatively, the system stores data representing online sources that historically provided content containing references to new businesses and then continues to access those sources for additional content. In embodiments, the system performs content analysis on structured and/or unstructured content. In some embodiments, analysis of content received from a particular online source includes a source-specific algorithm that takes a source-specific representation of the content as input and produces a result indicating the likelihood that the content includes a new business reference.
-
公开(公告)号:US20240419981A1
公开(公告)日:2024-12-19
申请号:US18750363
申请日:2024-06-21
Applicant: ByteDance Inc.
Inventor: Mark Thomas Daly , Shawn Ryan Jeffery , Matthew DeLand , Nick Pendar , Andrew James , David Johnston
IPC: G06N5/02 , G06F16/215 , G06F16/23 , G06N20/00
Abstract: In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
-
4.
公开(公告)号:US12174872B2
公开(公告)日:2024-12-24
申请号:US18409278
申请日:2024-01-10
Applicant: Bytedance Inc.
Inventor: Nick Pendar
IPC: G06F16/30 , G06F16/31 , G06F16/338 , G06F16/35 , G06F16/93
Abstract: Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.
-
-
-