-
公开(公告)号:US20220309391A1
公开(公告)日:2022-09-29
申请号:US17216475
申请日:2021-03-29
发明人: Dhavalkumar C. PATEL , Si Er HAN , Jiang Bo KANG
摘要: Methods, computer program products, and systems are presented. The method, computer program products, and systems can include, for instance: examining an enterprise dataset, the enterprise dataset defined by enterprise collected data; selecting one or more synthetic dataset in dependence on the examining, the one or more synthetic dataset including data other than data collected by the enterprise; training a set of predictive models using data of the one or more synthetic dataset to provide a set of trained predictive models; testing the set of trained predictive models with use of holdout data of the one or more synthetic dataset; and presenting prompting data on a displayed user interface of a developer user in dependence on result data resulting from the testing, the prompting data prompting the developer user to direct action with respect to one or more model of the set of predictive models.
-
公开(公告)号:US20220012640A1
公开(公告)日:2022-01-13
申请号:US16924934
申请日:2020-07-09
发明人: Arun Kwangil IYENGAR , Jeffrey Owen KEPHART , Dhavalkumar C. PATEL , Dung Tien PHAN , Chandrasekhara K. REDDY
摘要: Techniques for model evaluation and selection are provided. A plurality of models trained to generate predictions at each of a plurality of intervals is received, and a plurality of model ensembles, each specifying one or more of the plurality of models for each of the plurality of intervals, is generated. A test data set is received, where the test data set includes values for at least a first interval of the plurality of intervals and does not include values for at least a second interval of the plurality of intervals. A first model ensemble, of the plurality of model ensembles, is selected based on processing the test data set using each of the plurality of model ensembles.
-
公开(公告)号:US20210357781A1
公开(公告)日:2021-11-18
申请号:US16875533
申请日:2020-05-15
摘要: A processing system, a computer program product, and a method for efficiently determining a best imputation algorithm from a plurality of imputation algorithms A method includes: providing a plurality of imputation algorithms; providing a time parameter tmax to limit an amount of time spent determining a best imputation algorithm; maintaining past information i on accuracy and execution time for at least one of the imputation algorithms; using said information i to compute a utility score for each of the at least one the imputation algorithms; and testing imputation algorithms and associated parameters in an order based on said utility scores.
-
公开(公告)号:US20240362458A1
公开(公告)日:2024-10-31
申请号:US18309268
申请日:2023-04-28
发明人: Nam H. NGUYEN , Yuqi NIE , Chandrasekhara K. REDDY , Dhavalkumar C. PATEL , Anuradha BHAMIDIPATY , Jayant R. KALAGNANAM , Phanwadee SINTHONG
IPC分类号: G06N3/0455 , G06N3/0895
CPC分类号: G06N3/0455 , G06N3/0895
摘要: A method, system, and computer program product that is configured to: receive an input time series from an external device in a first system, divide the input time series to a set of univariate time subseries, transform the set of univariate time subseries into a univariate prediction result series using a transformer model, concatenate the univariate prediction result series to a multivariate predictive result, and output the multivariate predictive result for providing time series forecasting to a second system.
-
公开(公告)号:US20240256943A1
公开(公告)日:2024-08-01
申请号:US18103057
申请日:2023-01-30
IPC分类号: G06N20/00
CPC分类号: G06N20/00
摘要: A method includes obtaining, by a processor set, labeled training data associated with a system; identifying, by the processor set, a first region and a second region in the labeled training data, wherein the first region is associated with a failure of the system and the second region is exclusive of the first region; and creating, by the processor set, re-labeled training data by altering one or more labels of the labeled training data in the first region based on data in the second region.
-
公开(公告)号:US20210357794A1
公开(公告)日:2021-11-18
申请号:US16875450
申请日:2020-05-15
IPC分类号: G06N7/00 , G06F16/906 , G06F17/18 , G06F17/17
摘要: A processing system, a computer program product, and a method for determining a best imputation algorithm from a plurality of imputation algorithms A method includes: providing a plurality of imputation algorithms; defining a data analytics task in which at least one step of the data analytics task includes determining at least one missing data value by imputation; executing the data analytics task multiple times wherein each execution of the data analytics task uses a data imputation algorithm of the plurality of data imputation algorithms to determine at least one missing data value; determining an error for each execution of the data analytics task; and selecting an imputation algorithm which results in a least error for the data analytics task.
-
公开(公告)号:US20220327058A1
公开(公告)日:2022-10-13
申请号:US17654965
申请日:2022-03-15
发明人: Long VU , Bei CHEN , Xuan-Hong DANG , Peter Daniel KIRCHNER , Syed Yousaf SHAH , Dhavalkumar C. PATEL , Si Er HAN , Ji Hui YANG , Jun WANG , Jing James XU , Dakuo WANG , Gregory BRAMBLE , Horst Cornelius SAMULOWITZ , Saket K. SATHE , Wesley M. GIFFORD , Petros ZERFOS
IPC分类号: G06F12/0871 , G06N20/00
摘要: To automate time series forecasting machine learning pipeline generation, a data allocation size of time series data may be determined based on one or more characteristics of a time series data set. The time series data may be allocated for use by candidate machine learning pipelines based on the data allocation size. Features for the time series data may be determined and cached by the candidate machine learning pipelines. Predictions of each of the candidate machine learning pipelines using at least the one or more features may be evaluated. A ranked list of machine learning pipelines may be automatically generated from the candidate machine learning pipelines for time series forecasting based upon evaluating predictions of each of the one or more candidate machine learning pipelines.
-
公开(公告)号:US20220261598A1
公开(公告)日:2022-08-18
申请号:US17452287
申请日:2021-10-26
发明人: Bei CHEN , Long VU , Dhavalkumar C. PATEL , Syed Yousaf SHAH , Gregory BRAMBLE , Peter Daniel KIRCHNER , Horst Cornelius SAMULOWITZ , Xuan-Hong DANG , Petros ZERFOS
摘要: To rank time series forecasting in machine learning pipelines, time series data may be incrementally allocated from a time series data set for testing by candidate machine learning pipelines based on seasonality or a degree of temporal dependence of the time series data. Intermediate evaluation scores may be provided by each of the candidate machine learning pipelines following each time series data allocation. One or more machine learning pipelines may be automatically selected from a ranked list of the one or more candidate machine learning pipelines based on a projected learning curve generated from the intermediate evaluation scores.
-
公开(公告)号:US20220012641A1
公开(公告)日:2022-01-13
申请号:US16925013
申请日:2020-07-09
发明人: Arun Kwangil IYENGAR , Jeffrey Owen KEPHART , Dhavalkumar C. PATEL , Dung Tien PHAN , Chandrasekhara K. REDDY
摘要: Techniques for generating model ensembles are provided. A plurality of models trained to generate predictions at each of a plurality of intervals is received. A respective prediction accuracy of each respective model of the plurality of models is determined for a first interval of the plurality of intervals by processing labeled evaluation data using the respective model. Additionally, a model ensemble specifying one or more of the plurality of models for each of the plurality of intervals is generated, comprising selecting, for the first interval, a first model of the plurality of models based on (i) the respective prediction accuracies and (ii) at least one non-error metric.
-
-
-
-
-
-
-
-