DATA DIVERSITY VISUALIZATION AND QUANTIFICATION FOR MACHINE LEARNING MODELS

    公开(公告)号:US20220351055A1

    公开(公告)日:2022-11-03

    申请号:US17243046

    申请日:2021-04-28

    Abstract: Systems and techniques that facilitate data diversity visualization and/or quantification for machine learning models are provided. In various embodiments, a processor can access a first dataset and a second dataset, where a machine learning (ML) model is trained on the first dataset. In various instances, the processor can obtain a first set of latent activations generated by the ML model based on the first dataset, and a second set of latent activations generated by the ML model based on the second dataset. In various aspects, the processor can generate a first set of compressed data points based on the first set of latent activations, and a second set of compressed data points based on the second set of latent activations, via dimensionality reduction. In various instances, a diversity component can compute a diversity score based on the first set of compressed data points and second set of compressed data points.

Patent Agency Ranking