-
公开(公告)号:US12265889B2
公开(公告)日:2025-04-01
申请号:US17083148
申请日:2020-10-28
Applicant: Oracle International Corporation
Inventor: Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Sanjay Jinturkar , Nipun Agarwal
IPC: G06N20/00 , G06F18/214 , G06F18/2411 , G06F18/2413 , G06F18/243
Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.
-
公开(公告)号:US11720751B2
公开(公告)日:2023-08-08
申请号:US17146375
申请日:2021-01-11
Applicant: Oracle International Corporation
Inventor: Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal
IPC: G06F17/00 , G06F40/284 , G06F40/30 , G06F40/166 , G06N20/00
CPC classification number: G06F40/284 , G06F40/166 , G06F40/30 , G06N20/00
Abstract: A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, “NLP-MLX”, is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions. Finally, a Token Evaluation stage uses the ML model and perturbed documents to evaluate the impact of each candidate token relative to predictions for the original documents.
-
公开(公告)号:US20220188645A1
公开(公告)日:2022-06-16
申请号:US17124018
申请日:2020-12-16
Applicant: Oracle International Corporation
Inventor: Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal
Abstract: Herein are counterfactual explanations of machine learning (ML) inferencing provided by generative adversarial networks (GANs) that ensure realistic counterfactuals and use latent spaces to optimize perturbations. In an embodiment, a first computer trains a generator model in a GAN. A same or second computer hosts a classifier model that inferences an original label for original feature values respectively for many features. Runtime ML explainability (MLX) occurs on the first or second or a third computer as follows. The generator model from the GAN generates a sequence of revised feature values that are based on noise. The noise is iteratively optimized based on a distance between the original feature values and current revised feature values in the sequence of revised feature values. The classifier model inferences a current label respectively for each counterfactual in the sequence of revised feature values. Satisfactory discovered counterfactuals are promoted as explanations of behavior of the classifier model.
-
公开(公告)号:US20220129791A1
公开(公告)日:2022-04-28
申请号:US17083148
申请日:2020-10-28
Applicant: Oracle International Corporation
Inventor: Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Sanjay Jinturkar , Nipun Agarwal
Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.
-
公开(公告)号:US20250094777A1
公开(公告)日:2025-03-20
申请号:US18821539
申请日:2024-08-30
Applicant: Oracle International Corporation
Inventor: Anatoly Yakovlev , Sandeep R. Agrawal , Karoon Rashedi Nia , Ridha Chahed , Sanjay Jinturkar , Nipun Agarwal
IPC: G06N3/0455
Abstract: The present disclosure relates to LLM orchestration with vector store generation. An embeddings model may be selected to generate an embedding for a digital artifact. Metadata for the digital artifact may also be generated and stored in a vector store in association with the embedding. A user query may be received and categorized. One of a plurality of machine learning models may be selected based on the categorization of the user query. A prompt may be generated based at least in part on the user query, and the selected machine learning model may generate a response to the user query based at least in part on the prompt.
-
公开(公告)号:US20240086763A1
公开(公告)日:2024-03-14
申请号:US17944949
申请日:2022-09-14
Applicant: Oracle International Corporation
Inventor: Jeremy Plassmann , Anatoly Yakovlev , Sandeep R. Agrawal , Ali Moharrer , Sanjay Jinturkar , Nipun Agarwal
Abstract: Techniques for computing global feature explanations using adaptive sampling are provided. In one technique, first and second samples from an dataset are identified. A first set of feature importance values (FIVs) is generated based on the first sample and a machine-learned model. A second set of FIVs is generated based on the second sample and the model. If a result of a comparison between the first and second FIV sets does not satisfy criteria, then: (i) an aggregated set is generated based on the last two FIV sets; (ii) a new sample that is double the size of a previous sample is identified from the dataset; (iii) a current FIV set is generated based on the new sample and the model; (iv) determine whether a result of a comparison between the current and aggregated FIV sets satisfies criteria; repeating (i)-(iv) until the result of the last comparison satisfies the criteria.
-
7.
公开(公告)号:US20220366297A1
公开(公告)日:2022-11-17
申请号:US17319729
申请日:2021-05-13
Applicant: Oracle International Corporation
Inventor: Yasha Pushak , Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Sanjay Jinturkar , Nipun Agarwal
Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. For each feature, and for each of many original tuples, the computer: a) randomly selects many perturbed values from original values of the feature in the original tuples, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference of the perturbed tuples and the particular inference. For each feature, a respective importance of the feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a local ML explainability (MLX) explanation.
-
公开(公告)号:US20220019784A1
公开(公告)日:2022-01-20
申请号:US16929949
申请日:2020-07-15
Applicant: Oracle International Corporation
Inventor: Jian Wen , Hamed Ahmadi , Sanjay Jinturkar , Nipun Agarwal , Lijian Wan , Shrikumar Hariharasubrahmanian
IPC: G06K9/00 , G06K9/62 , G06F16/13 , G06F40/289 , G06F21/62
Abstract: Herein is a probabilistic indexing technique for searching semi-structured text documents in columnar storage formats such as Parquet, using columnar input/output (I/O) avoidance, and needing minimal storage overhead. In an embodiment, a computer associates columns with text strings that occur in semi-structured documents. Text words that occur in the text strings are detected. Respectively for each text word, a bitmap, of a plurality of bitmaps, that contains a respective bit for each column is generated. Based on at least one of the bitmaps, some of the columns or some of the semi-structured documents are accessed.
-
公开(公告)号:US20250094787A1
公开(公告)日:2025-03-20
申请号:US18808300
申请日:2024-08-19
Applicant: Oracle International Corporation
Inventor: Karoon Rashedi Nia , Anatoly Yakovlev , Sandeep R. Agrawal , Ridha Chahed , Sanjay Jinturkar , Nipun Agarwal
IPC: G06N3/0475 , G06F21/62 , G06N3/092
Abstract: Disclosed herein are various approaches for sharing knowledge within and between organizations while protecting sensitive data. A machine learning model may be trained using training prompts querying a vector store to prevent unauthorized user disclosure of data derived from the vector store. A prompt may be received and a response to the prompt may be generated using the machine learning model based at least in part on the vector store.
-
10.
公开(公告)号:US11989657B2
公开(公告)日:2024-05-21
申请号:US17071285
申请日:2020-10-15
Applicant: Oracle International Corporation
Inventor: Nikan Chavoshi , Anatoly Yakovlev , Hesam Fathi Moghadam , Venkatanathan Varadarajan , Sandeep Agrawal , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal
Abstract: Herein, a computer generates and evaluates many preprocessor configurations for a window preprocessor that transforms a training timeseries dataset for an ML model. With each preprocessor configuration, the window preprocessor is configured. The window preprocessor then converts the training timeseries dataset into a configuration-specific point-based dataset that is based on the preprocessor configuration. The ML model is trained based on the configuration-specific point-based dataset to calculate a score for the preprocessor configuration. Based on the scores of the many preprocessor configurations, an optimal preprocessor configuration is selected for finally configuring the window preprocessor, after which, the window preprocessor can optimally transform a new timeseries dataset such as in an offline or online production environment such as for real-time processing of a live streaming timeseries.
-
-
-
-
-
-
-
-
-