Patent search ap:("ORACLE INTERNATIONAL CORPORATION") AND inv:"Tomas Karnagel" Page 1

1.

发明授权
Automatic feature subset selection using feature ranking and scalable automatic search 有权

公开(公告)号：US11544630B2

公开(公告)日：2023-01-03

申请号：US16417145

申请日：2019-05-20

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventor： Tomas Karnagel , Sam Idicula , Nipun Agarwal

IPC: G06N20/20 , G06N5/04

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer calculates, for each feature of a training dataset, a relevance score based on: a relevance scoring function, and statistics of values, of the feature, that occur in the training dataset. A rank based on relevance scores of the features is calculated for each feature. A sequence of distinct subsets of the features, based on the ranks of the features, is generated. For each distinct subset of the sequence of distinct feature subsets, a fitness score is generated based on training a machine learning (ML) model that is configured for the distinct subset.

2.

发明授权
Automated provisioning for database performance 有权

公开(公告)号：US11782926B2

公开(公告)日：2023-10-10

申请号：US17573897

申请日：2022-01-12

Applicant: Oracle International Corporation

Inventor： Sam Idicula , Tomas Karnagel , Jian Wen , Seema Sundara , Nipun Agarwal , Mayur Bency

IPC: G06F16/2453 , G06N20/00 , G06F16/21 , G06N20/20

CPC classification number: G06F16/24545 , G06F16/217 , G06N20/00 , G06N20/20

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

3.

发明授权
Chaining bloom filters to estimate the number of keys with low frequencies in a dataset 有权

公开(公告)号：US11520834B1

公开(公告)日：2022-12-06

申请号：US17387841

申请日：2021-07-28

Applicant: Oracle International Corporation

Inventor： Tomas Karnagel , Suratna Budalakoti , Onur Kocberber , Nipun Agarwal , Alan Wood

IPC: G06F16/00 , G06F16/9035

Abstract: Techniques are described for generating an approximate frequency histogram using a series of Bloom filters (BF). For example, to estimate the f1 and f2 cardinalities in a dataset, an ordered chain of three BFs is established (“BF1”, “BF2”, and “BF3”). An insertion operation is performed for each datum in the dataset, whereby the BFs are tested in order (starting at BF1) for the datum. If the datum is represented in a currently-tested BF, the subsequent BF in the chain is tested for the datum. If the datum is not represented in the currently-tested BF, the datum is added to the BF, a counter for the BF is incremented, and the insertion operation for the current datum ends. To estimate the cardinality of f1-values in the dataset, the BF2-counter is subtracted from the BF1-counter. Similarly, to estimate the cardinality of f2-values in the dataset, the BF3-counter is subtracted from the BF2-counter.

4.

发明申请
AUTOMATED PROVISIONING FOR DATABASE PERFORMANCE 有权

公开(公告)号：US20220138199A1

公开(公告)日：2022-05-05

申请号：US17573897

申请日：2022-01-12

Applicant: Oracle International Corporation

Inventor： Sam Idicula , Tomas Karnagel , Jian Wen , Seema Sundara , Nipun Agarwal , Mayur Bency

IPC: G06F16/2453 , G06N20/00 , G06F16/21 , G06N20/20

Abstract: Embodiments utilize trained query performance machine learning (QP-ML) models to predict an optimal compute node cluster size for a given in-memory workload. The QP-ML models include models that predict query task runtimes at various compute node cardinalities, and models that predict network communication time between nodes of the cluster. Embodiments also utilize an analytical model to predict overlap between predicted task runtimes and predicted network communication times. Based on this data, an optimal cluster size is selected for the workload. Embodiments further utilize trained data capacity machine learning (DC-ML) models to predict a minimum number of compute nodes needed to run a workload. The DC-ML models include models that predict the size of the workload dataset in a target data encoding, models that predict the amount of memory needed to run the queries in the workload, and models that predict the memory needed to accommodate changes to the dataset.

5.

发明申请
ESTIMATING NUMBER OF DISTINCT VALUES IN A DATA SET USING MACHINE LEARNING 有权

公开(公告)号：US20210365805A1

公开(公告)日：2021-11-25

申请号：US16877882

申请日：2020-05-19

Applicant: Oracle International Corporation

Inventor： Tomas Karnagel , Onur Kocberber , Farhan Tauheed , Nipun Agarwal

IPC: G06N5/04 , G06N20/00

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

6.

发明授权
Enabling efficient machine learning model inference using adaptive sampling for autonomous database services 有权

公开(公告)号：US12014286B2

公开(公告)日：2024-06-18

申请号：US16914816

申请日：2020-06-29

Applicant: Oracle International Corporation

Inventor： Farhan Tauheed , Onur Kocberber , Tomas Karnagel , Nipun Agarwal

IPC: G06N5/04 , G06F16/22 , G06N20/00

CPC classification number: G06N5/04 , G06F16/2282 , G06N20/00

Abstract: Herein are approaches for self-optimization of a database management system (DBMS) such as in real time. Adaptive just-in-time sampling techniques herein estimate database content statistics that a machine learning (ML) model may use to predict configuration settings that conserve computer resources such as execution time and storage space. In an embodiment, a computer repeatedly samples database content until a dynamic convergence criterion is satisfied. In each iteration of a series of sampling iterations, a subset of rows of a database table are sampled, and estimates of content statistics of the database table are adjusted based on the sampled subset of rows. Immediately or eventually after detecting dynamic convergence, a machine learning (ML) model predicts, based on the content statistic estimates, an optimal value for a configuration setting of the DBMS.

7.

发明授权
Automatic feature subset selection based on meta-learning 有权

公开(公告)号：US11615265B2

公开(公告)日：2023-03-28

申请号：US16547312

申请日：2019-08-21

Applicant: Oracle International Corporation

Inventor： Tomas Karnagel , Sam Idicula , Hesam Fathi Moghadam , Nipun Agarwal

IPC: G06F16/00 , G06K9/62 , G06N20/00

Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.

8.

发明授权
Automated configuration parameter tuning for database performance 有权

公开(公告)号：US11567937B2

公开(公告)日：2023-01-31

申请号：US17318972

申请日：2021-05-12

Applicant: Oracle International Corporation

Inventor： Sam Idicula , Tomas Karnagel , Jian Wen , Seema Sundara , Nipun Agarwal , Mayur Bency

IPC: G06F16/2453 , G06N20/00 , G06F16/21 , G06N20/20

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.

9.

发明授权
Automated configuration parameter tuning for database performance 有权

公开(公告)号：US11061902B2

公开(公告)日：2021-07-13

申请号：US16298837

申请日：2019-03-11

Applicant: Oracle International Corporation

Inventor： Sam Idicula , Tomas Karnagel , Jian Wen , Seema Sundara , Nipun Agarwal , Mayur Bency

IPC: G06F16/2453 , G06N20/00 , G06F16/21 , G06N20/20

Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.

10.

发明授权
Estimating number of distinct values in a data set using machine learning 有权

公开(公告)号：US11620547B2

公开(公告)日：2023-04-04

申请号：US16877882

申请日：2020-05-19

Applicant: Oracle International Corporation

Inventor： Tomas Karnagel , Onur Kocberber , Farhan Tauheed , Nipun Agarwal

IPC: G06N5/04 , G06N20/00

Abstract: Techniques for estimating the number of distinct values in a data set using machine learning are provided. In one technique, a sample of a data set is retrieved where the sample is a strict subset of the data set. The sample is analyzed to identify feature values of multiple features of the sample. The feature values are inserted into a machine-learned model that computes a prediction regarding a number of distinct values in the data set. An estimated number of distinct values that is based on the prediction is stored in association with the data set.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification