SYSTEMS AND METHODS FOR VALIDATING DATA

    公开(公告)号:US20220138034A1

    公开(公告)日:2022-05-05

    申请号:US17573580

    申请日:2022-01-11

    Abstract: Systems and methods are validating data in a data set. A data set including data to validate and a validator to use in validating the data is selected based on user input generated based on interactions of a user with a graphical user interface. The validator is applied to the data to determine whether one or more statistics generated through application of the validator to the data is valid or invalid based on a validation routine associated with the validator. A data quality report indicating whether the data set is valid or invalid, based on a determination of whether the one or more statistics is valid or invalid, is generated and selectively presented to the user through the graphical user interface.

    Systems and methods for selecting machine learning training data

    公开(公告)号:US10325224B1

    公开(公告)日:2019-06-18

    申请号:US15644231

    申请日:2017-07-07

    Abstract: Systems and methods are provided for selecting training examples to increase the efficiency of supervised active machine learning processes. Training examples for presentation to a user may be selected according to measure of the model's uncertainty in labeling the examples. A number of training examples may be selected to increase efficiency between the user and the processing system by selecting the number of training examples to minimize user downtime in the machine learning process.

    FEATURE CLUSTERING OF USERS, USER CORRELATION DATABASE ACCESS, AND USER INTERFACE GENERATION SYSTEM

    公开(公告)号:US20190108249A1

    公开(公告)日:2019-04-11

    申请号:US16198614

    申请日:2018-11-21

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a feature clustering of users, user correlation database access, and user interface generation system. The system can obtain information stored in different databases located across geographic regions, and determine unique users from the different information. The information can be included in unique records in the databases, with each record describing a particular user, and with each user described with imperfect identifying information. The system can analyze the different information utilizing machine learning models, and can associate each record with a particular unique user. The system can obtain identifications of items associated with each user, and determine the propensity of the user to disassociate with one or more items, or determine likelihoods of future association with different items not presently associated with the user.

    Feature clustering of users, user correlation database access, and user interface generation system

    公开(公告)号:US10140327B2

    公开(公告)日:2018-11-27

    申请号:US15239585

    申请日:2016-08-17

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a feature clustering of users, user correlation database access, and user interface generation system. The system can obtain information stored in different databases located across geographic regions, and determine unique users from the different information. The information can be included in unique records in the databases, with each record describing a particular user, and with each user described with imperfect identifying information. The system can analyze the different information utilizing machine learning models, and can associate each record with a particular unique user. The system can obtain identifications of items associated with each user, and determine the propensity of the user to disassociate with one or more items, or determine likelihoods of future association with different items not presently associated with the user.

    SYSTEMS AND METHODS FOR SELECTING MACHINE LEARNING TRAINING DATA

    公开(公告)号:US20180330280A1

    公开(公告)日:2018-11-15

    申请号:US16027161

    申请日:2018-07-03

    CPC classification number: G06N99/005 G06N5/04

    Abstract: Systems and methods are provided for selecting training examples to increase the efficiency of supervised active machine learning processes. Training examples for presentation to a user may be selected according to measure of the model's uncertainty in labeling the examples. A number of training examples may be selected to increase efficiency between the user and the processing system by selecting the number of training examples to minimize user downtime in the machine learning process.

Patent Agency Ranking