-
公开(公告)号:US20220343207A1
公开(公告)日:2022-10-27
申请号:US17237379
申请日:2021-04-22
发明人: Long Vu , Saket Sathe , Bei Chen , Peter Daniel Kirchner
摘要: In a method for ranking machine learning (ML) pipelines for a dataset, a processor receives first performance curves predicted by a meta learner model for a plurality of ML pipelines. A processor allocates a first subset of data points from the dataset to each of the plurality of ML pipelines. A processor receives first performance scores for each of the ML pipelines for the first subset of data points. A processor updates the meta learner model using the first performance scores. A processor receives second performance curves from the meta learner model updated with the first performance scores. A processor ranks the plurality of ML pipelines based on the second performance curves.
-
公开(公告)号:US11295242B2
公开(公告)日:2022-04-05
申请号:US16682946
申请日:2019-11-13
发明人: Yuan-Chi Chang , Deepak Srinivas Turaga , Long Vu , Venkata Nagaraju Pavuluri , Saket Sathe , Rodrigue Ngueyep Tzoumpe
摘要: Split an input dataset into training and test datasets; the former includes a plurality of data examples, each represented as a feature vector, and having an associated true label. Split the training dataset into a plurality of training data subsets; for each, train a corresponding machine learning model to obtain a plurality of such models, and apply same to the test dataset to obtain a plurality of predicted labels and prediction scores. For each of the plurality of examples, compute an agreement metric based on a corresponding one of the associated true labels; corresponding ones of the predicted labels; and corresponding ones of the prediction scores. Based on the computed metric, select, for at least some of the true label values, appropriate ones of the data examples to be added to a regression set. Add the appropriate ones of the data examples from the test dataset to the regression set.
-
公开(公告)号:US11966340B2
公开(公告)日:2024-04-23
申请号:US17654965
申请日:2022-03-15
发明人: Long Vu , Bei Chen , Xuan-Hong Dang , Peter Daniel Kirchner , Syed Yousaf Shah , Dhavalkumar C. Patel , Si Er Han , Ji Hui Yang , Jun Wang , Jing James Xu , Dakuo Wang , Gregory Bramble , Horst Cornelius Samulowitz , Saket K. Sathe , Wesley M. Gifford , Petros Zerfos
IPC分类号: G06F12/0871 , G06N20/00
CPC分类号: G06F12/0871 , G06N20/00 , G06F2212/604
摘要: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.
-
公开(公告)号:US11868230B2
公开(公告)日:2024-01-09
申请号:US17692268
申请日:2022-03-11
CPC分类号: G06F11/3452 , G06F11/3428 , G06N20/00
摘要: Computer hardware and/or software that performs the following operations: (i) assessing a performance of a plurality of unsupervised machine learning pipelines against a plurality of data sets; (ii) associating the performance with meta-features corresponding to respective pipeline/data set combinations; (iii) training a supervised meta-learning model using the associated performance and meta-features as training data; and (iv) utilizing the trained model to identify one or more pipelines for processing an input data set.
-
公开(公告)号:US11620582B2
公开(公告)日:2023-04-04
申请号:US16942247
申请日:2020-07-29
发明人: Bei Chen , Long Vu , Syed Yousaf Shah , Xuan-Hong Dang , Peter Daniel Kirchner , Si Er Han , Ji Hui Yang , Jun Wang , Jing James Xu , Dakuo Wang , Dhavalkumar C. Patel , Gregory Bramble , Horst Cornelius Samulowitz , Saket Sathe , Chuang Gan
IPC分类号: G06N20/20
摘要: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.
-
公开(公告)号:US20220358388A1
公开(公告)日:2022-11-10
申请号:US17316103
申请日:2021-05-10
发明人: Long Vu , Dharmashankar Subramanian , Peter Daniel Kirchner , Eliezer Segev Wasserkrug , Lan Ngoc Hoang , Alexander Zadorojniy
摘要: Methods and systems for generating an environment include training transformer models from tabular data and relationship information about the training data. A directed acyclic graph is generated, that includes the transformer models as nodes. The directed acyclic graph is traversed to identify a subset of transformers that are combined in order. An environment is generated using the subset of transformers.
-
公开(公告)号:US20220036610A1
公开(公告)日:2022-02-03
申请号:US16942284
申请日:2020-07-29
发明人: Dakuo Wang , Bei Chen , Ji Hui Yang , Abel Valente , Arunima Chaudhary , Chuang Gan , John Dillon Eversman , Voranouth Supadulya , Daniel Karl I. Weidele , Jun Wang , Jing James Xu , Dhavalkumar C. Patel , Long Vu , Syed Yousaf Shah , Si Er Han
IPC分类号: G06T11/20 , G06F3/0481
摘要: Systems, computer-implemented methods, and computer program products to facilitate visualization of a model selection process are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise an interaction backend handler component that obtains one or more assessment metrics of a model pipeline candidate. The computer executable components can further comprise a visualization render component that renders a progress visualization of the model pipeline candidate based on the one or more assessment metrics.
-
公开(公告)号:US11099979B2
公开(公告)日:2021-08-24
申请号:US16669761
申请日:2019-10-31
发明人: Yuan-Chi Chang , Long Vu , Timothy R. Dinger , Venkata N. Pavuluri , Lingtao Cao
摘要: A mechanism is provided to identify wall-clock time reference dependency in one or more software components of a data analytics solution. The data analytics solution is decomposed into a set of software components. A first software component of the set of software components is deployed to a first computer server and the remaining software components are deployed to a second computer server. A system clock time on the first computer server is changed to differ from the system clock of the second computer server. Based on executing a test on the data analytics solution, a determination is made of whether the first software component, is wall-clock time independent. Responsive to the test of the of the software component failing indicating that the wall-clock time of the software component is dependent of the system clock time difference, the software component is recorded as wall-clock time dependent and an administrator is notified.
-
公开(公告)号:US12066813B2
公开(公告)日:2024-08-20
申请号:US17696840
申请日:2022-03-16
IPC分类号: G05B19/4155 , G06N20/00
CPC分类号: G05B19/4155 , G06N20/00 , G05B2219/31449
摘要: A relationship between an input, a set-point of a plurality of processes and an output of a corresponding process is learned using machine learning. A regression function is derived for each process based upon historical data. An autoencoder is trained for each process based upon the historical data to form a regularizer and the regression functions and regularizers are merged together into a unified optimization problem. System level optimization is performed using the regression functions and regularizers and a set of optimal set-points of a global optimal solution for operating the processes is determined. An industrial system is operated based on the set of optimal set-points.
-
公开(公告)号:US20230237385A1
公开(公告)日:2023-07-27
申请号:US17583522
申请日:2022-01-25
发明人: Lan Ngoc Hoang , Long Vu
IPC分类号: G06N20/20
CPC分类号: G06N20/20
摘要: A computer-implemented method for configuring a plurality of machine learning pipelines into a machine learning pipeline ensemble is disclosed. The computer-implemented method includes determining, by a reinforcement learning agent coupled to a machine learning pipeline, performance information of the machine learning pipeline. The computer-implemented method further includes receiving, by the reinforcement learning agent, configuration parameter values of uncoupled machine learning pipelines of the plurality of machine learning pipelines. The computer-implemented method further includes adjusting, by the reinforcement learning agent, configuration parameter values of the machine learning pipeline based on the performance information of the machine learning pipeline and the configuration parameter values of the uncoupled machine learning pipelines.
-
-
-
-
-
-
-
-
-