Systems and methods for validating data

    公开(公告)号:US11221898B2

    公开(公告)日:2022-01-11

    申请号:US16675056

    申请日:2019-11-05

    Abstract: Systems and methods are validating data in a data set. A data set including data to validate and a validator to use in validating the data is selected based on user input generated based on interactions of a user with a graphical user interface. The validator is applied to the data to determine whether one or more statistics generated through application of the validator to the data is valid or invalid based on a validation routine associated with the validator. A data quality report indicating whether the data set is valid or invalid, based on a determination of whether the one or more statistics is valid or invalid, is generated and selectively presented to the user through the graphical user interface.

    SYSTEMS AND METHODS FOR VALIDATING DATA
    23.
    发明申请

    公开(公告)号:US20200073743A1

    公开(公告)日:2020-03-05

    申请号:US16675056

    申请日:2019-11-05

    Abstract: Systems and methods are validating data in a data set. A data set including data to validate and a validator to use in validating the data is selected based on user input generated based on interactions of a user with a graphical user interface. The validator is applied to the data to determine whether one or more statistics generated through application of the validator to the data is valid or invalid based on a validation routine associated with the validator. A data quality report indicating whether the data set is valid or invalid, based on a determination of whether the one or more statistics is valid or invalid, is generated and selectively presented to the user through the graphical user interface.

    Techniques for configuring and validating a data pipeline deployment

    公开(公告)号:US10534595B1

    公开(公告)日:2020-01-14

    申请号:US15977666

    申请日:2018-05-11

    Abstract: Techniques for configuring and validating a data pipeline system deployment are described. In an embodiment, a template is a file or data object that describes a package of related jobs. For example, a template may describe a set of jobs necessary for deduplication of data records or a set of jobs performing machine learning on a set of data records. The template can be defined in a file, such as a JSON blob or XML file. For each job specified in the template, the template may identify a set of dataset dependencies that are needed as input for the processing of that job. For each job specified in the template, the template may further identify a set of configuration parameters needed for deployment of the job. In an embodiment, a server uses the template and the configuration parameter values collected via the GUI to generate code for the package of jobs. The code may be stored in a version control system. In an embodiment, the code may be compiled, executed, and deployed to a server for processing the data.

Patent Agency Ranking