-
公开(公告)号:US12014286B2
公开(公告)日:2024-06-18
申请号:US16914816
申请日:2020-06-29
Applicant: Oracle International Corporation
Inventor: Farhan Tauheed , Onur Kocberber , Tomas Karnagel , Nipun Agarwal
CPC classification number: G06N5/04 , G06F16/2282 , G06N20/00
Abstract: Herein are approaches for self-optimization of a database management system (DBMS) such as in real time. Adaptive just-in-time sampling techniques herein estimate database content statistics that a machine learning (ML) model may use to predict configuration settings that conserve computer resources such as execution time and storage space. In an embodiment, a computer repeatedly samples database content until a dynamic convergence criterion is satisfied. In each iteration of a series of sampling iterations, a subset of rows of a database table are sampled, and estimates of content statistics of the database table are adjusted based on the sampled subset of rows. Immediately or eventually after detecting dynamic convergence, a machine learning (ML) model predicts, based on the content statistic estimates, an optimal value for a configuration setting of the DBMS.
-
公开(公告)号:US20240086763A1
公开(公告)日:2024-03-14
申请号:US17944949
申请日:2022-09-14
Applicant: Oracle International Corporation
Inventor: Jeremy Plassmann , Anatoly Yakovlev , Sandeep R. Agrawal , Ali Moharrer , Sanjay Jinturkar , Nipun Agarwal
Abstract: Techniques for computing global feature explanations using adaptive sampling are provided. In one technique, first and second samples from an dataset are identified. A first set of feature importance values (FIVs) is generated based on the first sample and a machine-learned model. A second set of FIVs is generated based on the second sample and the model. If a result of a comparison between the first and second FIV sets does not satisfy criteria, then: (i) an aggregated set is generated based on the last two FIV sets; (ii) a new sample that is double the size of a previous sample is identified from the dataset; (iii) a current FIV set is generated based on the new sample and the model; (iv) determine whether a result of a comparison between the current and aggregated FIV sets satisfies criteria; repeating (i)-(iv) until the result of the last comparison satisfies the criteria.
-
公开(公告)号:US11868261B2
公开(公告)日:2024-01-09
申请号:US17381072
申请日:2021-07-20
Applicant: Oracle International Corporation
Inventor: Peyman Faizian , Mayur Bency , Onur Kocberber , Seema Sundara , Nipun Agarwal
IPC: G06F16/2455 , G06F12/0842
CPC classification number: G06F12/0842 , G06F16/24552 , G06F2212/6022
Abstract: Techniques are described herein for prediction of an buffer pool size (BPS). Before performing BPS prediction, gathered data are used to determine whether a target workload is in a steady state. Historical utilization data gathered while the workload is in a steady state are used to predict object-specific BPS components for database objects, accessed by the target workload, that are identified for BPS analysis based on shares of the total disk I/O requests, for the workload, that are attributed to the respective objects. Preference of analysis is given to objects that are associated with larger shares of disk I/O activity. An object-specific BPS component is determined based on a coverage function that returns a percentage of the database object size (on disk) that should be available in the buffer pool for that database object. The percentage is determined using either a heuristic-based or a machine learning-based approach.
-
公开(公告)号:US11790242B2
公开(公告)日:2023-10-17
申请号:US16166039
申请日:2018-10-19
Applicant: Oracle International Corporation
Inventor: Sandeep Agrawal , Venkatanathan Varadarajan , Sam Idicula , Nipun Agarwal
Abstract: Techniques are described for generating and applying mini-machine learning variants of machine learning algorithms to save computational resources in tuning and selection of machine learning algorithms. In an embodiment, at least one of the hyper-parameter values for a reference variant is modified to a new hyper-parameter value thereby generating a new variant of machine learning algorithm from the reference variant of machine learning algorithm. A performance score is determined for the new variant of machine learning algorithm using a training dataset, the performance score representing the accuracy of the new machine learning model for the training dataset. By performing training of the new variant of machine learning algorithm with the training data set, a cost metric of the new variant of machine learning algorithm is measured by measuring usage the used computing resources for the training. Based on the cost metric of the new variant of machine learning algorithm and comparing the performance score for the new and reference variants, the system determines whether the modified reference machine algorithm is the mini-machine learning algorithm that is computationally less costly than the reference variant of machine learning algorithm but closely tracks the accuracy thereof.
-
公开(公告)号:US11615265B2
公开(公告)日:2023-03-28
申请号:US16547312
申请日:2019-08-21
Applicant: Oracle International Corporation
Inventor: Tomas Karnagel , Sam Idicula , Hesam Fathi Moghadam , Nipun Agarwal
Abstract: The present invention relates to dimensionality reduction for machine learning (ML) models. Herein are techniques that individually rank features and combine features based on their rank to achieve an optimal combination of features that may accelerate training and/or inferencing, prevent overfitting, and/or provide insights into somewhat mysterious datasets. In an embodiment, a computer ranks features of datasets of a training corpus. For each dataset and for each landmark percentage, a target ML model is configured to receive only a highest ranking landmark percentage of features, and a landmark accuracy achieved by training the ML model with the dataset is measured. Based on the landmark accuracies and meta-features values of the dataset, a respective training tuple is generated for each dataset. Based on all of the training tuples, a regressor is trained to predict an optimal amount of features for training the target ML model.
-
公开(公告)号:US11567937B2
公开(公告)日:2023-01-31
申请号:US17318972
申请日:2021-05-12
Applicant: Oracle International Corporation
Inventor: Sam Idicula , Tomas Karnagel , Jian Wen , Seema Sundara , Nipun Agarwal , Mayur Bency
IPC: G06F16/2453 , G06N20/00 , G06F16/21 , G06N20/20
Abstract: Embodiments implement a prediction-driven, rather than a trial-driven, approach to automate database configuration parameter tuning for a database workload. This approach uses machine learning (ML) models to test performance metrics resulting from application of particular database parameters to a database workload, and does not require live trials on the DBMS managing the workload. Specifically, automatic configuration (AC) ML models are trained, using a training corpus that includes information from workloads being run by DBMSs, to predict performance metrics based on workload features and configuration parameter values. The trained AC-ML models predict performance metrics resulting from applying particular configuration parameter values to a given database workload being automatically tuned. Based on correlating changes to configuration parameter values with changes in predicted performance metrics, an optimization algorithm is used to converge to an optimal set of configuration parameters. The optimal set of configuration parameter values is automatically applied for the given workload.
-
7.
公开(公告)号:US20220366297A1
公开(公告)日:2022-11-17
申请号:US17319729
申请日:2021-05-13
Applicant: Oracle International Corporation
Inventor: Yasha Pushak , Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Sanjay Jinturkar , Nipun Agarwal
Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. For each feature, and for each of many original tuples, the computer: a) randomly selects many perturbed values from original values of the feature in the original tuples, b) generates perturbed tuples that are based on the original tuple and a respective perturbed value, c) causes the ML model to infer a respective perturbed inference for each perturbed tuple, and d) measures a respective difference between each perturbed inference of the perturbed tuples and the particular inference. For each feature, a respective importance of the feature is calculated based on the differences measured for the feature. Feature importances may be used to rank features by influence and/or generate a local ML explainability (MLX) explanation.
-
公开(公告)号:US20220107933A1
公开(公告)日:2022-04-07
申请号:US17060999
申请日:2020-10-01
Applicant: Oracle International Corporation
Inventor: Onur Kocberber , Mayur Bency , Marc Jolles , Seema Sundara , Nipun Agarwal
IPC: G06F16/23 , G06F16/245
Abstract: Systems and methods for adjusting parameters for a spin-lock implementation of concurrency control are described herein. In an embodiment, a system continuously retrieves, from a resource management system, one or more state values defining a state of the resource management system. Based on the one or more state values, the system determines that the resource management system has reached a steady state and, in response adjusts a plurality of parameters for spin-locking performed by said resource management system to identify optimal values for the plurality of parameters. After adjusting the plurality of parameters, the system detects, based on one or more current state values, a workload change in the resource management system and, in response, readjusts the plurality of parameters for spin-locking performed by said resource management system to identify new optimal values for the parameters.
-
公开(公告)号:US20220019784A1
公开(公告)日:2022-01-20
申请号:US16929949
申请日:2020-07-15
Applicant: Oracle International Corporation
Inventor: Jian Wen , Hamed Ahmadi , Sanjay Jinturkar , Nipun Agarwal , Lijian Wan , Shrikumar Hariharasubrahmanian
IPC: G06K9/00 , G06K9/62 , G06F16/13 , G06F40/289 , G06F21/62
Abstract: Herein is a probabilistic indexing technique for searching semi-structured text documents in columnar storage formats such as Parquet, using columnar input/output (I/O) avoidance, and needing minimal storage overhead. In an embodiment, a computer associates columns with text strings that occur in semi-structured documents. Text words that occur in the text strings are detected. Respectively for each text word, a bitmap, of a plurality of bitmaps, that contains a respective bit for each column is generated. Based on at least one of the bitmaps, some of the columns or some of the semi-structured documents are accessed.
-
公开(公告)号:US20210390089A1
公开(公告)日:2021-12-16
申请号:US17459447
申请日:2021-08-27
Applicant: Oracle International Corporation
Inventor: Pit Fender , Felix Schmidt , Benjamin Schlegel , Matthias Brantner , Nipun Agarwal
Abstract: Techniques related to code dictionary generation based on non-blocking operations are disclosed. In some embodiments, a column of tokens includes a first token and a second token that are stored in separate rows. The column of tokens is correlated with a set of row identifiers including a first row identifier and a second row identifier that is different from the first row identifier. Correlating the column of tokens with the set of row identifiers involves: storing a correlation between the first token and the first row identifier, storing a correlation between the second token and the second row identifier if the first token and the second token have different values, and storing a correlation between the second token and the first row identifier if the first token and the second token have identical values. After correlating the column of tokens with the set of row identifiers, duplicate correlations are removed.
-
-
-
-
-
-
-
-
-