SYSTEMS AND METHODS FOR GENERATING SYNTHETIC TABULAR DATA FOR MACHINE LEARNING AND OTHER APPLICATIONS

    公开(公告)号:US20240330682A1

    公开(公告)日:2024-10-03

    申请号:US18295094

    申请日:2023-04-03

    Applicant: Adobe Inc.

    CPC classification number: G06N3/08 G06N3/0455

    Abstract: Systems and methods for generating synthetic tabular data for machine learning and other applications are provided. In some embodiments, a variational autoencoder is trained to learn inter-feature correlations found in tabular data collected from real data sources. The trained variational autoencoder is used to train a generator model of a Generative Adversarial Network (GAN) to generate synthetic tabular data that exhibits the inter-feature correlation distribution found in the tabular data collected from real data sources. In some embodiments, processing devices perform operations comprising: receiving a set of tabular data records, each record comprising a plurality of features; training a first machine learning model using the tabular data records to learn correlations between the plurality of features; and training a second machine learning model, using the first machine learning model, to generate a synthetic tabular data records based at least on the one or more correlations between the plurality of features.

Patent Agency Ranking