Entropy Based Synthetic Data Generation For Augmenting Classification System Training Data

    公开(公告)号:US20210117718A1

    公开(公告)日:2021-04-22

    申请号:US16659147

    申请日:2019-10-21

    Applicant: Adobe Inc.

    Abstract: A data classification system is trained to classify input data into multiple classes. The system is initially trained by adjusting weights within the system based on a set of training data that includes multiple tuples, each being a training instance and corresponding training label. Two training instances, one from a minority class and one from a majority class, are selected from the set of training data based on entropies for the training instances. A synthetic training instance is generated by combining the two selected training instances and a corresponding training label is generated. A tuple including the synthetic training instance and the synthetic training label is added to the set of training data, resulting in an augmented training data set. One or more such synthetic training instances can be added to the augmented training data set and the system is then re-trained on the augmented training data set.

    Entropy based synthetic data generation for augmenting classification system training data

    公开(公告)号:US11423264B2

    公开(公告)日:2022-08-23

    申请号:US16659147

    申请日:2019-10-21

    Applicant: Adobe Inc.

    Abstract: A data classification system is trained to classify input data into multiple classes. The system is initially trained by adjusting weights within the system based on a set of training data that includes multiple tuples, each being a training instance and corresponding training label. Two training instances, one from a minority class and one from a majority class, are selected from the set of training data based on entropies for the training instances. A synthetic training instance is generated by combining the two selected training instances and a corresponding training label is generated. A tuple including the synthetic training instance and the synthetic training label is added to the set of training data, resulting in an augmented training data set. One or more such synthetic training instances can be added to the augmented training data set and the system is then re-trained on the augmented training data set.

Patent Agency Ranking