OUT-OF-DISTRIBUTION DETECTION WITH GENERATIVE MODELS

    公开(公告)号:US20250103961A1

    公开(公告)日:2025-03-27

    申请号:US18893616

    申请日:2024-09-23

    Abstract: Generative models are used to determine whether a data sample is in-distribution or out-of-distribution with respect to a training data set. To address potential errors in generative models that attribute high likelihoods to known out-of-distribution data samples, in addition to the likelihood for a data sample, the local intrinsic dimensionality is also evaluated for the data sample. A data sample is determined to belong to the distribution of the training data when the data sample both has sufficient likelihood and local intrinsic dimensionality around its region in the generative model. Different actions may then be determined for the data sample with respect to a data application model based on whether the data sample is in- or out-of-distribution.

    TABULAR DATA GENERATION
    2.
    发明申请

    公开(公告)号:US20250124220A1

    公开(公告)日:2025-04-17

    申请号:US18911044

    申请日:2024-10-09

    Abstract: A tabular data model, which may be pre-trained on a different data set, is used to generate data samples for a target class with a given set of context data points. The tabular data model is trained to predict class membership of a given data point with a set of context data points. Rather than use the predicted class directly, the class predictions are used to determine a class-conditional energy for a synthetic data point with respect to the target class. The synthetic data point may then be updated based on the class-conditional energy with a stochastic update algorithm, such as stochastic gradient Langevin dynamics or Adaptive Moment Estimation with noise. The value of the synthetic data point is sampled as a data point for the target class. This permits effective data augmentation for tabular data for downstream models.

    MODELING DISJOINT MANIFOLDS
    3.
    发明公开

    公开(公告)号:US20230386190A1

    公开(公告)日:2023-11-30

    申请号:US18202455

    申请日:2023-05-26

    CPC classification number: G06V10/82 G06V10/7625

    Abstract: A computer model is trained to account for data samples in a high-dimensional space as lying on different manifolds, rather than a single manifold to represent the data set, accounting for the data set as a whole as a union of manifolds. Different data samples that may be expected to belong to the same underlying manifold are determined by grouping the data. For generative models, a generative model may be trained that includes a sub-model for each group trained on that group's data samples, such that each sub-model can account for the manifold of that group. The overall generative model includes information describing the frequency to sample from each sub-model to correctly represent the data set as a whole in sampling. Multi-class classification models may also use the grouping to improve classification accuracy by weighing group data samples according to the estimated latent dimensionality of the group.

    LEARNED DENSITY ESTIMATION WITH IMPLICIT MANIFOLDS

    公开(公告)号:US20230385693A1

    公开(公告)日:2023-11-30

    申请号:US18202450

    申请日:2023-05-26

    CPC classification number: G06N20/00 G06N7/01

    Abstract: Probability density modeling, such as for generative modeling, for data on a manifold of a high-dimensional space is performed with an implicitly-defined manifold such that points belonging to the manifold is the zero set of a manifold-defining function. An energy function is trained to learn an energy function that, evaluated on the manifold, describes a probability density for the manifold. As such, the relevant portions of the energy function are “filtered through” the defined manifold for training and in application. The combined energy function and manifold-defining function provide an “energy-based implicit manifold” that can more effectively model probability densities of a manifold in the high-dimensional space. As the manifold-defining function and the energy function are defined across the high-dimensional space, they may more effectively learn geometries and avoid distortions due to change in dimension that occur for models that model the manifold in a lower-dimensional space.

Patent Agency Ranking