-
公开(公告)号:US20250117443A1
公开(公告)日:2025-04-10
申请号:US18482975
申请日:2023-10-09
Applicant: International Business Machines Corporation
Inventor: Lei Tian , Han Zhang , Jing James Xu , Xue Ying Zhang , Si Er Han
IPC: G06F18/2325
Abstract: A computer-implemented method for performing data difference evaluation is provided. Aspects include obtaining a first data set and a second data set, creating a first plurality of feature vectors by inputting the first data set into each of a plurality of models, and creating a second plurality of feature vectors by inputting the second data set into each of the plurality of models. Aspects also include identifying a mapping between elements of the first plurality of vectors and elements the second plurality of feature vectors created by a same model of the plurality of models, calculating, for each of the plurality of models based at least in part on the mapping, a model distance between the first data set and the second data set, and calculating, based at least in part on the model distances, an ensemble distance between first data set and the second data set.
-
公开(公告)号:US20250094267A1
公开(公告)日:2025-03-20
申请号:US18368656
申请日:2023-09-15
Applicant: International Business Machines Corporation
Inventor: Jun Wang , Jing Xu , Xiao Ming Ma , Xue Ying Zhang , Si Er Han , Jing James Xu , Wen Pei Yu
IPC: G06F11/07
Abstract: A time series anomaly detection method, system, and computer program product that processes time series data includes absorbing profiles of the time series data and anomaly types of a model as features, optimizing biased ranks to create optimized ranks through merging initial ranks with new ranks generated by real anomalies, and auto-suggesting the optimized ranks for saving a predetermined amount of data operation.
-
公开(公告)号:US20230185879A1
公开(公告)日:2023-06-15
申请号:US17644350
申请日:2021-12-15
Applicant: International Business Machines Corporation
Inventor: Si Er Han , Xue Ying Zhang , Jing Xu , Xiao Ming Ma , Ji Hui Yang
CPC classification number: G06K9/6228 , G06K9/6261 , G06K9/6262 , G06N20/00
Abstract: A computer implemented technique including: splitting data of a historical time series data set into subsets; updating a time series model by backwards data selection to obtain an interim version of the time series model; exploring pattern changes in the new data to obtain new predictors of pattern change; and updating the interim version of the time series model by applying the new predictors of pattern change to obtain an updated version of the time series model.
-
公开(公告)号:US20230137184A1
公开(公告)日:2023-05-04
申请号:US17453540
申请日:2021-11-04
Applicant: International Business Machines Corporation
Inventor: Si Er Han , Ji Hui Yang , Xiao Ming Ma , Jing Xu , Xue Ying Zhang
Abstract: A method, system, and computer program product for incremental machine learning for a parametric machine learning model are disclosed. The method may include processing samples comprising historical samples and new samples with an existing parametric machine learning model to obtain at least one prediction residual of each of the samples, wherein the existing parametric machine learning model was trained based on the historical samples. The method may further include clustering the samples based on the at least one prediction residual of each of the samples and features of each of the samples. The method may further include sampling samples in each cluster to ensure that each cluster includes substantially similar number of sampled samples. The method may further include updating the existing parametric machine learning model to obtain an updated parametric machine learning model based on sampled samples in each cluster.
-
公开(公告)号:US11619225B2
公开(公告)日:2023-04-04
申请号:US17114869
申请日:2020-12-08
Applicant: International Business Machines Corporation
Inventor: Yang Yang , Chong Liu , Si Er Han , Xiao Ming Ma , Jun Wang , Chun Lei Xu
Abstract: Methods, computer program products, and/or systems are provided that perform the following operations: obtaining a series of indicator diagrams corresponding to strokes of a pumpjack over a specific time duration, dividing each indicator diagram into a plurality of location segments in a direction of location of the rod; obtaining load difference features between upstroke loads and corresponding downstroke loads in the plurality of location segments; identifying a location segment with an abnormal load difference feature based on a time series data of load difference feature corresponding to one of the plurality of location segments, the time series data of load difference feature including a series of data points of load difference feature of the one of the plurality of location segments in time order; and providing an indication of a potential problem based, at least in part, on the identification of the location segment with an abnormal load difference feature.
-
公开(公告)号:US20230073137A1
公开(公告)日:2023-03-09
申请号:US17447258
申请日:2021-09-09
Applicant: International Business Machines Corporation
Inventor: Jing Xu , Si Er Han , Xue Ying Zhang , Steven George Barbee , Ji Hui Yang
Abstract: A computer implemented method for machine learning model training. A number of processor units creates a cluster model comprising labeled samples and unlabeled samples. The number of processor units identifies cluster information for the labeled samples from the cluster model. The number of processor units adds a set of new features to a set of original features for the labeled samples using the cluster information to form an extended set of features for the labeled samples, wherein the labeled samples with the set of original features and the set of new features form a training data set for training a machine learning model.
-
公开(公告)号:US20220101044A1
公开(公告)日:2022-03-31
申请号:US17035816
申请日:2020-09-29
Applicant: International Business Machines Corporation
Inventor: Jing Xu , Xue Ying Zhang , Si Er Han , Xiao Ming Ma , Ji Hui Yang
Abstract: A computer receives a general predictive model and training data. The computer builds a clustering feature tree model to condense the training data into data groups. The computer applies a leave-one-out evaluation method to determine an impact value for each data groups with regard to said general predictive model. The computer identifies a diagnostic category for each data group selected from a list of categories including model-harmful data, model-neutral data, and model-helping data, in accordance with said impact value. The computer removes data in groups labelled as model-harmful from the training data and builds a modified general predictive model based on data in groups labelled as model-neutral or model-helping.
-
公开(公告)号:US12298990B1
公开(公告)日:2025-05-13
申请号:US18524131
申请日:2023-11-30
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Si Er Han , Xiao Ming Ma , Xue Ying Zhang , Jing Xu , Ji Hui Yang , Jun Wang
IPC: G06F16/00 , G06F16/2458 , G06F16/28
Abstract: An approach to time-series data point anomaly detection may be presented. Data point anomalies in time-series data can cause a cascade of incorrect predictions in a time-series data prediction model. Presented herein may be an approach to decompose a time-series training data set into elementary components, such as seasonal, trend and residual. The approach may determine one or more confidence intervals for elementary components of data points including level shift, variance, and outlier. From these confidence intervals, new data points can be analyzed and identified as anomaly data points. The approach may also prevent anomaly data points from being incorporated into a time series data prediction model, reducing prediction error in the prediction model.
-
公开(公告)号:US12293438B2
公开(公告)日:2025-05-06
申请号:US18064959
申请日:2022-12-13
Applicant: International Business Machines Corporation
Inventor: Wen Pei Yu , Xiao Ming Ma , Xue Ying Zhang , Si Er Han , Jing James Xu , Jing Xu , Jun Wang
Abstract: In an approach for post-modeling data visualization and analysis, a processor presents a first visualization of a training dataset in a first plot. Responsive to receiving a selection of a data group of the training dataset to analyze, a processor identifies three or fewer key model features of the data group of the training dataset. A processor ascertains a representative record of each key model feature of the three or fewer key model features using a Local Interpretable Model-Agnostic Explanation technique. A processor presents a second visualization of the three or fewer key model features and the representative record of each key model feature in a second plot.
-
公开(公告)号:US12056524B2
公开(公告)日:2024-08-06
申请号:US17443831
申请日:2021-07-28
Applicant: International Business Machines Corporation
Inventor: Jing Xu , Xue Ying Zhang , Xiao Ming Ma , Si Er Han , Ji Hui Yang
CPC classification number: G06F9/4887 , G06F9/5005 , G06F11/3423 , G06F11/3452 , G06F2209/501 , G06F2209/5019
Abstract: Performing predictive analysis on running batch jobs is provided. A series of batch end time predictive models is retrieved according to a sequence of milestone jobs in a batch of jobs. Retrieved batch end time predictive models are assembled into an aggregate batch end time predictive model to increase accuracy and stability of an end time prediction of the batch of jobs. The aggregate batch end time predictive model is utilized to predict an end time of the batch of jobs during running of the batch of jobs to form a predicted end time of the batch of jobs.
-
-
-
-
-
-
-
-
-