Patent search ap:("Oracle International Corporation") AND inv:"Jingxiao Cai" Page 1

1.

发明申请
AUTOMATED MACHINE LEARNING PIPELINE FOR TIMESERIES DATASETS UTILIZING POINT-BASED ALGORITHMS 有权

公开(公告)号：US20220121955A1

公开(公告)日：2022-04-21

申请号：US17071285

申请日：2020-10-15

Applicant: Oracle International Corporation

Inventor： Nikan Chavoshi , Anatoly Yakovlev , Hesam Fathi Moghadam , Venkatanathan Varadarajan , Sandeep Agrawal , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/08 , G06N20/00

Abstract: Herein, a computer generates and evaluates many preprocessor configurations for a window preprocessor that transforms a training timeseries dataset for an ML model. With each preprocessor configuration, the window preprocessor is configured. The window preprocessor then converts the training timeseries dataset into a configuration-specific point-based dataset that is based on the preprocessor configuration. The ML model is trained based on the configuration-specific point-based dataset to calculate a score for the preprocessor configuration. Based on the scores of the many preprocessor configurations, an optimal preprocessor configuration is selected for finally configuring the window preprocessor, after which, the window preprocessor can optimally transform a new timeseries dataset such as in an offline or online production environment such as for real-time processing of a live streaming timeseries.

2.

发明申请
ADAPTIVE SAMPLING FOR IMBALANCE MITIGATION AND DATASET SIZE REDUCTION IN MACHINE LEARNING 审中-公开

公开(公告)号：US20200342265A1

公开(公告)日：2020-10-29

申请号：US16718164

申请日：2019-12-17

Applicant: Oracle International Corporation

Inventor： Jingxiao Cai , Sandeep Agrawal , Sam Idicula , Venkatanathan Varadarajan , Anatoly Yakovlev , Nipun Agarwal

IPC: G06K9/62 , G06N20/00

Abstract: According to an embodiment, a method includes generating a first dataset sample from a dataset, calculating a first validation score for the first dataset sample and a machine learning model, and determining whether a difference in validation score between the first validation score and a second validation score satisfies a first criteria. If the difference in validation score does not satisfy the first criteria, the method includes generating a second dataset sample from the dataset. If the difference in validation score does satisfy the first criteria, the method includes updating a convergence value and determining whether the updated convergence value satisfies a second criteria. If the updated convergence value satisfies the second criteria, the method includes returning the first dataset sample. If the updated convergence value does not satisfy the second criteria, the method includes generating the second dataset sample from the dataset.

3.

发明授权
Automated machine learning pipeline for timeseries datasets utilizing point-based algorithms 有权

公开(公告)号：US11989657B2

公开(公告)日：2024-05-21

申请号：US17071285

申请日：2020-10-15

Applicant: Oracle International Corporation

Inventor： Nikan Chavoshi , Anatoly Yakovlev , Hesam Fathi Moghadam , Venkatanathan Varadarajan , Sandeep Agrawal , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N20/00 , G06N3/088

CPC classification number: G06N3/088 , G06N20/00

Abstract: Herein, a computer generates and evaluates many preprocessor configurations for a window preprocessor that transforms a training timeseries dataset for an ML model. With each preprocessor configuration, the window preprocessor is configured. The window preprocessor then converts the training timeseries dataset into a configuration-specific point-based dataset that is based on the preprocessor configuration. The ML model is trained based on the configuration-specific point-based dataset to calculate a score for the preprocessor configuration. Based on the scores of the many preprocessor configurations, an optimal preprocessor configuration is selected for finally configuring the window preprocessor, after which, the window preprocessor can optimally transform a new timeseries dataset such as in an offline or online production environment such as for real-time processing of a live streaming timeseries.

4.

发明授权
Adaptive sampling for imbalance mitigation and dataset size reduction in machine learning 有权

公开(公告)号：US11562178B2

公开(公告)日：2023-01-24

申请号：US16718164

申请日：2019-12-17

Applicant: Oracle International Corporation

Inventor： Jingxiao Cai , Sandeep Agrawal , Sam Idicula , Venkatanathan Varadarajan , Anatoly Yakovlev , Nipun Agarwal

IPC: G06K9/62 , G06N20/00

Abstract: According to an embodiment, a method includes generating a first dataset sample from a dataset, calculating a first validation score for the first dataset sample and a machine learning model, and determining whether a difference in validation score between the first validation score and a second validation score satisfies a first criteria. If the difference in validation score does not satisfy the first criteria, the method includes generating a second dataset sample from the dataset. If the difference in validation score does satisfy the first criteria, the method includes updating a convergence value and determining whether the updated convergence value satisfies a second criteria. If the updated convergence value satisfies the second criteria, the method includes returning the first dataset sample. If the updated convergence value does not satisfy the second criteria, the method includes generating the second dataset sample from the dataset.

5.

发明申请
FAST, PREDICTIVE, AND ITERATION-FREE AUTOMATED MACHINE LEARNING PIPELINE 有权

公开(公告)号：US20210390466A1

公开(公告)日：2021-12-16

申请号：US17086204

申请日：2020-10-30

Applicant: Oracle International Corporation

Inventor： Venkatanathan Varadarajan , Sandeep R. Agrawal , Hesam Fathi Moghadam , Anatoly Yakovlev , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal , Sam Idicula , Nikan Chavoshi

IPC: G06N20/20 , G06N5/04

Abstract: A proxy-based automatic non-iterative machine learning (PANI-ML) pipeline is described, which predicts machine learning model configuration performance and outputs an automatically-configured machine learning model for a target training dataset. Techniques described herein use one or more proxy models—which implement a variety of machine learning algorithms and are pre-configured with tuned hyperparameters—to estimate relative performance of machine learning model configuration parameters at various stages of the PANI-ML pipeline. The PANI-ML pipeline implements a radically new approach of rapidly narrowing the search space for machine learning model configuration parameters by performing algorithm selection followed by algorithm-specific adaptive data reduction (i.e., row- and/or feature-wise dataset sampling), and then hyperparameter tuning. Furthermore, because of the one-pass nature of the PANI-ML pipeline and because each stage of the pipeline has convergence criteria by design, the whole PANI-ML pipeline has a novel convergence property that stops the configuration search after one pass.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification