GENERATING DATASETS FOR MACHINE LEARNING SYSTEMS

    公开(公告)号:US20230297841A1

    公开(公告)日:2023-09-21

    申请号:US18317803

    申请日:2023-05-15

    CPC classification number: G06N3/088 G06N3/045

    Abstract: Disclosed herein are embodiments of systems, methods, and products comprising an analytic server that automates training dataset generation for different application areas. The server may perform an automated, iterative refinement process to build a collection of dataset generator models over time. The server may receive a set of seed examples in a domain and generate candidate examples based on the features of the seed examples using data synthesis techniques. The server may execute a pre-trained label discriminator (LD) and domain discriminator (D2) on the candidate examples. The LD may identify and reject mislabeled data. The D2 may identify and reject out of domain data. The analytic server may regenerate new labeled data based on the feedback of the LD and D2. The analytic server may train a dataset generator by iteratively performing these steps for refinement until the regenerated candidate examples reach a pass rate threshold.

Patent Agency Ranking