Systematic approach for explaining machine learning predictions

    公开(公告)号:US12265889B2

    公开(公告)日:2025-04-01

    申请号:US17083148

    申请日:2020-10-28

    Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.

    USING GENERATIVE ADVERSARIAL NETWORKS TO CONSTRUCT REALISTIC COUNTERFACTUAL EXPLANATIONS FOR MACHINE LEARNING MODELS

    公开(公告)号:US20220188645A1

    公开(公告)日:2022-06-16

    申请号:US17124018

    申请日:2020-12-16

    Abstract: Herein are counterfactual explanations of machine learning (ML) inferencing provided by generative adversarial networks (GANs) that ensure realistic counterfactuals and use latent spaces to optimize perturbations. In an embodiment, a first computer trains a generator model in a GAN. A same or second computer hosts a classifier model that inferences an original label for original feature values respectively for many features. Runtime ML explainability (MLX) occurs on the first or second or a third computer as follows. The generator model from the GAN generates a sequence of revised feature values that are based on noise. The noise is iteratively optimized based on a distance between the original feature values and current revised feature values in the sequence of revised feature values. The classifier model inferences a current label respectively for each counterfactual in the sequence of revised feature values. Satisfactory discovered counterfactuals are promoted as explanations of behavior of the classifier model.

    SYSTEMATIC APPROACH FOR EXPLAINING MACHINE LEARNING PREDICTIONS

    公开(公告)号:US20220129791A1

    公开(公告)日:2022-04-28

    申请号:US17083148

    申请日:2020-10-28

    Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.

    ADAPTIVE SAMPLING TO COMPUTE GLOBAL FEATURE EXPLANATIONS WITH SHAPLEY VALUES

    公开(公告)号:US20240086763A1

    公开(公告)日:2024-03-14

    申请号:US17944949

    申请日:2022-09-14

    CPC classification number: G06N20/00 G06N5/042

    Abstract: Techniques for computing global feature explanations using adaptive sampling are provided. In one technique, first and second samples from an dataset are identified. A first set of feature importance values (FIVs) is generated based on the first sample and a machine-learned model. A second set of FIVs is generated based on the second sample and the model. If a result of a comparison between the first and second FIV sets does not satisfy criteria, then: (i) an aggregated set is generated based on the last two FIV sets; (ii) a new sample that is double the size of a previous sample is identified from the dataset; (iii) a current FIV set is generated based on the new sample and the model; (iv) determine whether a result of a comparison between the current and aggregated FIV sets satisfies criteria; repeating (i)-(iv) until the result of the last comparison satisfies the criteria.

    LOCAL PERMUTATION IMPORTANCE: A STABLE, LINEAR-TIME LOCAL MACHINE LEARNING FEATURE ATTRIBUTOR

    公开(公告)号:US20220366297A1

    公开(公告)日:2022-11-17

    申请号:US17319729

    申请日:2021-05-13

    Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. For each feature, and for each of many original tuples, the computer: a) randomly selects many perturbed values from original values of the feature in the original tuples, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference of the perturbed tuples and the particular inference. For each feature, a respective importance of the feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a local ML explainability (MLX) explanation.

Patent Agency Ranking