-
公开(公告)号:US20250103961A1
公开(公告)日:2025-03-27
申请号:US18893616
申请日:2024-09-23
Applicant: THE TORONTO-DOMINION BANK
Inventor: Jesse Cole Cresswell , Brendan Leigh Ross , Gabriel Loaiza Ganem , Anthony Lawrence Caterini , Hamidreza Kamkari
IPC: G06N20/00
Abstract: Generative models are used to determine whether a data sample is in-distribution or out-of-distribution with respect to a training data set. To address potential errors in generative models that attribute high likelihoods to known out-of-distribution data samples, in addition to the likelihood for a data sample, the local intrinsic dimensionality is also evaluated for the data sample. A data sample is determined to belong to the distribution of the training data when the data sample both has sufficient likelihood and local intrinsic dimensionality around its region in the generative model. Different actions may then be determined for the data sample with respect to a data application model based on whether the data sample is in- or out-of-distribution.
-
公开(公告)号:US20250124220A1
公开(公告)日:2025-04-17
申请号:US18911044
申请日:2024-10-09
Applicant: THE TORONTO-DOMINION BANK
Inventor: Guangwei Yu , Junwei Ma , Anthony Lawrence Caterini , George Frazer Stein
IPC: G06F40/177
Abstract: A tabular data model, which may be pre-trained on a different data set, is used to generate data samples for a target class with a given set of context data points. The tabular data model is trained to predict class membership of a given data point with a set of context data points. Rather than use the predicted class directly, the class predictions are used to determine a class-conditional energy for a synthetic data point with respect to the target class. The synthetic data point may then be updated based on the class-conditional energy with a stochastic update algorithm, such as stochastic gradient Langevin dynamics or Adaptive Moment Estimation with noise. The value of the synthetic data point is sampled as a data point for the target class. This permits effective data augmentation for tabular data for downstream models.
-
公开(公告)号:US20230386190A1
公开(公告)日:2023-11-30
申请号:US18202455
申请日:2023-05-26
Applicant: THE TORONTO-DOMINION BANK
Inventor: Jesse Cole Cresswell , Brendan Leigh Ross , Anthony Lawrence Caterini , Gabriel Loaiza Ganem , Bradley Craig Anderson Brown
IPC: G06V10/82 , G06V10/762
CPC classification number: G06V10/82 , G06V10/7625
Abstract: A computer model is trained to account for data samples in a high-dimensional space as lying on different manifolds, rather than a single manifold to represent the data set, accounting for the data set as a whole as a union of manifolds. Different data samples that may be expected to belong to the same underlying manifold are determined by grouping the data. For generative models, a generative model may be trained that includes a sub-model for each group trained on that group's data samples, such that each sub-model can account for the manifold of that group. The overall generative model includes information describing the frequency to sample from each sub-model to correctly represent the data set as a whole in sampling. Multi-class classification models may also use the grouping to improve classification accuracy by weighing group data samples according to the estimated latent dimensionality of the group.
-
公开(公告)号:US20240419978A1
公开(公告)日:2024-12-19
申请号:US18738557
申请日:2024-06-10
Applicant: THE TORONTO-DOMINION BANK
Inventor: George Frazer Stein , Jesse Cole Cresswell , Rasa Hosseinzadeh , Yi Sui , Brendan Leigh Ross , Valentin Victor Villecroze , Zhaoyan Liu , Anthony Lawrence Caterini , Joseph Eric Timothy Taylor , Gabriel Loaiza Ganem
IPC: G06N3/09 , G06V10/774
Abstract: A variety of generative models are trained that are trained on a reference data set. The generative models are evaluated by candidate metrics to determine the relative rankings of the models as evaluated by the different candidate metrics. Rankings as generated by the models is compared with human evaluation of the generated results as simulated and the candidate metrics that most align with the human evaluation may then be used to automatically evaluate subsequent generative models. The candidate metrics may include various types of encoding models trained for non-generative purposes, such that the selected candidate metric may represent selecting an encoding model that performs well on the generative data.
-
公开(公告)号:US20230385693A1
公开(公告)日:2023-11-30
申请号:US18202450
申请日:2023-05-26
Applicant: THE TORONTO-DOMINION BANK
Inventor: Jesse Cole Cresswell , Brendan Leigh Ross , Anthony Lawrence Caterini , Gabriel Loaiza Ganem
Abstract: Probability density modeling, such as for generative modeling, for data on a manifold of a high-dimensional space is performed with an implicitly-defined manifold such that points belonging to the manifold is the zero set of a manifold-defining function. An energy function is trained to learn an energy function that, evaluated on the manifold, describes a probability density for the manifold. As such, the relevant portions of the energy function are “filtered through” the defined manifold for training and in application. The combined energy function and manifold-defining function provide an “energy-based implicit manifold” that can more effectively model probability densities of a manifold in the high-dimensional space. As the manifold-defining function and the energy function are defined across the high-dimensional space, they may more effectively learn geometries and avoid distortions due to change in dimension that occur for models that model the manifold in a lower-dimensional space.
-
公开(公告)号:US20230244917A1
公开(公告)日:2023-08-03
申请号:US18083345
申请日:2022-12-16
Applicant: THE TORONTO-DOMINION BANK
Inventor: Gabriel Loaiza Ganem , Brendan Leigh Ross , Jesse Cole Cresswell , Anthony Lawrence Caterini
IPC: G06N3/047 , G06N3/0455 , G06N3/088
CPC classification number: G06N3/047 , G06N3/0455 , G06N3/088
Abstract: To effectively learn a probability density from a data set in a high-dimensional space without manifold overfitting, a computer model first learns an autoencoder model that can transform data from a high-dimensional space to a low-dimensional space, and then learns a probability density model that may be effectively learned with maximum-likelihood. By separating these components, different types of models can be employed for each portion (e.g., manifold learning and density learning) and permits effective modeling of high-dimensional data sets that lie along a manifold representable with fewer dimensions, thus effectively learning both the density and the manifold and permitting effective data generation and density estimation.
-
-
-
-
-