Patent search ap:("Oracle International Corporation") AND inv:"Sanjay Jinturkar" Page 1

1.

发明授权
Systematic approach for explaining machine learning predictions 有权

公开(公告)号：US12265889B2

公开(公告)日：2025-04-01

申请号：US17083148

申请日：2020-10-28

Applicant: Oracle International Corporation

Inventor： Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06F18/214 , G06F18/2411 , G06F18/2413 , G06F18/243

Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.

2.

发明授权
Global, model-agnostic machine learning explanation technique for textual data 有权

公开(公告)号：US11720751B2

公开(公告)日：2023-08-08

申请号：US17146375

申请日：2021-01-11

Applicant: Oracle International Corporation

Inventor： Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal

IPC: G06F17/00 , G06F40/284 , G06F40/30 , G06F40/166 , G06N20/00

CPC classification number: G06F40/284 , G06F40/166 , G06F40/30 , G06N20/00

Abstract: A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, “NLP-MLX”, is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions. Finally, a Token Evaluation stage uses the ML model and perturbed documents to evaluate the impact of each candidate token relative to predictions for the original documents.

3.

发明申请
USING GENERATIVE ADVERSARIAL NETWORKS TO CONSTRUCT REALISTIC COUNTERFACTUAL EXPLANATIONS FOR MACHINE LEARNING MODELS 有权

公开(公告)号：US20220188645A1

公开(公告)日：2022-06-16

申请号：US17124018

申请日：2020-12-16

Applicant: Oracle International Corporation

Inventor： Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/08 , G06N3/04

Abstract: Herein are counterfactual explanations of machine learning (ML) inferencing provided by generative adversarial networks (GANs) that ensure realistic counterfactuals and use latent spaces to optimize perturbations. In an embodiment, a first computer trains a generator model in a GAN. A same or second computer hosts a classifier model that inferences an original label for original feature values respectively for many features. Runtime ML explainability (MLX) occurs on the first or second or a third computer as follows. The generator model from the GAN generates a sequence of revised feature values that are based on noise. The noise is iteratively optimized based on a distance between the original feature values and current revised feature values in the sequence of revised feature values. The classifier model inferences a current label respectively for each counterfactual in the sequence of revised feature values. Satisfactory discovered counterfactuals are promoted as explanations of behavior of the classifier model.

4.

发明申请
SYSTEMATIC APPROACH FOR EXPLAINING MACHINE LEARNING PREDICTIONS 有权

公开(公告)号：US20220129791A1

公开(公告)日：2022-04-28

申请号：US17083148

申请日：2020-10-28

Applicant: Oracle International Corporation

Inventor： Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06K9/62

Abstract: A systematic explainer is described herein, which comprises local, model-agnostic, surrogate ML model-based explanation techniques that faithfully explain predictions from any machine learning classifier or regressor. The systematic explainer systematically generates local data samples around a given target data sample, which improves on exhaustive or random data sample generation algorithms. Specifically, using principles of locality and approximation of local decision boundaries, techniques described herein identify a hypersphere (or data sample neighborhood) over which to train the surrogate ML model such that the surrogate ML model produces valuable, high-quality information explaining data samples in the neighborhood of the target data sample. Combining this systematic local data sample generation and a supervised neighborhood selection approach to weighting generated data samples relative to the target data sample achieves high explanation fidelity, locality, and repeatability when generating explanations for specific predictions from a given model.

5.

发明申请
AUTOMATED SELECTION OF EMBEDDING AND GENERATIVE MODELS WITH VECTOR STORE 有权

公开(公告)号：US20250094777A1

公开(公告)日：2025-03-20

申请号：US18821539

申请日：2024-08-30

Applicant: Oracle International Corporation

Inventor： Anatoly Yakovlev , Sandeep R. Agrawal , Karoon Rashedi Nia , Ridha Chahed , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/0455

Abstract: The present disclosure relates to LLM orchestration with vector store generation. An embeddings model may be selected to generate an embedding for a digital artifact. Metadata for the digital artifact may also be generated and stored in a vector store in association with the embedding. A user query may be received and categorized. One of a plurality of machine learning models may be selected based on the categorization of the user query. A prompt may be generated based at least in part on the user query, and the selected machine learning model may generate a response to the user query based at least in part on the prompt.

6.

发明公开
ADAPTIVE SAMPLING TO COMPUTE GLOBAL FEATURE EXPLANATIONS WITH SHAPLEY VALUES 审中-公开

公开(公告)号：US20240086763A1

公开(公告)日：2024-03-14

申请号：US17944949

申请日：2022-09-14

Applicant: Oracle International Corporation

Inventor： Jeremy Plassmann , Anatoly Yakovlev , Sandeep R. Agrawal , Ali Moharrer , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06N5/04

CPC classification number: G06N20/00 , G06N5/042

Abstract: Techniques for computing global feature explanations using adaptive sampling are provided. In one technique, first and second samples from an dataset are identified. A first set of feature importance values (FIVs) is generated based on the first sample and a machine-learned model. A second set of FIVs is generated based on the second sample and the model. If a result of a comparison between the first and second FIV sets does not satisfy criteria, then: (i) an aggregated set is generated based on the last two FIV sets; (ii) a new sample that is double the size of a previous sample is identified from the dataset; (iii) a current FIV set is generated based on the new sample and the model; (iv) determine whether a result of a comparison between the current and aggregated FIV sets satisfies criteria; repeating (i)-(iv) until the result of the last comparison satisfies the criteria.

7.

发明申请
LOCAL PERMUTATION IMPORTANCE: A STABLE, LINEAR-TIME LOCAL MACHINE LEARNING FEATURE ATTRIBUTOR 有权

公开(公告)号：US20220366297A1

公开(公告)日：2022-11-17

申请号：US17319729

申请日：2021-05-13

Applicant: Oracle International Corporation

Inventor： Yasha Pushak , Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06N5/04 , G06K9/62

Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. For each feature, and for each of many original tuples, the computer: a) randomly selects many perturbed values from original values of the feature in the original tuples, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference of the perturbed tuples and the particular inference. For each feature, a respective importance of the feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a local ML explainability (MLX) explanation.

8.

发明申请
PROBABILISTIC TEXT INDEX FOR SEMI-STRUCTURED DATA IN COLUMNAR ANALYTICS STORAGE FORMATS 有权

公开(公告)号：US20220019784A1

公开(公告)日：2022-01-20

申请号：US16929949

申请日：2020-07-15

Applicant: Oracle International Corporation

Inventor： Jian Wen , Hamed Ahmadi , Sanjay Jinturkar , Nipun Agarwal , Lijian Wan , Shrikumar Hariharasubrahmanian

IPC: G06K9/00 , G06K9/62 , G06F16/13 , G06F40/289 , G06F21/62

Abstract: Herein is a probabilistic indexing technique for searching semi-structured text documents in columnar storage formats such as Parquet, using columnar input/output (I/O) avoidance, and needing minimal storage overhead. In an embodiment, a computer associates columns with text strings that occur in semi-structured documents. Text words that occur in the text strings are detected. Respectively for each text word, a bitmap, of a plurality of bitmaps, that contains a respective bit for each column is generated. Based on at least one of the bitmaps, some of the columns or some of the semi-structured documents are accessed.

9.

发明申请
PRIVACY-PROTECTIVE KNOWLEDGE SHARING USING A HIERARCHICAL VECTOR STORE 有权

公开(公告)号：US20250094787A1

公开(公告)日：2025-03-20

申请号：US18808300

申请日：2024-08-19

Applicant: Oracle International Corporation

Inventor： Karoon Rashedi Nia , Anatoly Yakovlev , Sandeep R. Agrawal , Ridha Chahed , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/0475 , G06F21/62 , G06N3/092

Abstract: Disclosed herein are various approaches for sharing knowledge within and between organizations while protecting sensitive data. A machine learning model may be trained using training prompts querying a vector store to prevent unauthorized user disclosure of data derived from the vector store. A prompt may be received and a response to the prompt may be generated using the machine learning model based at least in part on the vector store.

10.

发明授权
Automated machine learning pipeline for timeseries datasets utilizing point-based algorithms 有权

公开(公告)号：US11989657B2

公开(公告)日：2024-05-21

申请号：US17071285

申请日：2020-10-15

Applicant: Oracle International Corporation

Inventor： Nikan Chavoshi , Anatoly Yakovlev , Hesam Fathi Moghadam , Venkatanathan Varadarajan , Sandeep Agrawal , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06N3/088

CPC classification number: G06N3/088 , G06N20/00

Abstract: Herein, a computer generates and evaluates many preprocessor configurations for a window preprocessor that transforms a training timeseries dataset for an ML model. With each preprocessor configuration, the window preprocessor is configured. The window preprocessor then converts the training timeseries dataset into a configuration-specific point-based dataset that is based on the preprocessor configuration. The ML model is trained based on the configuration-specific point-based dataset to calculate a score for the preprocessor configuration. Based on the scores of the many preprocessor configurations, an optimal preprocessor configuration is selected for finally configuring the window preprocessor, after which, the window preprocessor can optimally transform a new timeseries dataset such as in an offline or online production environment such as for real-time processing of a live streaming timeseries.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification