Systems and Methods for Active Curriculum Learning

    公开(公告)号:US20220366282A1

    公开(公告)日:2022-11-17

    申请号:US17738057

    申请日:2022-05-06

    IPC分类号: G06N5/04 G06N20/00 G06F40/211

    摘要: Computer systems and computer implemented methods for training a machine learning model are provided that includes: selecting seed data from an unlabeled dataset; labeling the seed data and storing the labeled seed data in a data store; training the machine learning model in an initial iteration using the labeled seed data, where the machine learning model is trained to select a next subset of the unlabeled dataset; selecting a next subset of the unlabeled dataset; computing difficulty scores for at least the next subset of the unlabeled dataset; labeling the next subset of the unlabeled data; and training the machine learning model in a second iteration using the labeled next subset of the unlabeled dataset. The machine learning model is generally trained to select the next subset of the unlabeled dataset for a subsequent training iteration by presenting the labeled next subset of the unlabeled dataset in an order sorted based on the difficulty scores.