SIMULATING TRAINING DATA TO MITIGATE BIASES IN MACHINE LEARNING MODELS

    公开(公告)号:US20230350977A1

    公开(公告)日:2023-11-02

    申请号:US17661026

    申请日:2022-04-27

    CPC classification number: G06K9/6257 G06K9/6289 G06N20/00

    Abstract: A method performed by a processing system including at least one processor includes identifying an insufficiency in a representation of a subpopulation in training data for a machine learning model, generating simulated data to mitigate the insufficiency in the representation, and training the machine learning model using an enhanced training data set that includes the training data and the simulated data to produce a trained machine learning model. In some examples, the generating and the training may be repeated in response to determining that an output of the trained machine learning model still reflects the insufficiency in the representation of the subpopulation or reflects an insufficiency in a representation of another subpopulation. In other examples, the simulated data may be stored for future reuse.

Patent Agency Ranking