DATA DIFFERENCE EVALUATION VIA MODEL COMPARISON

    公开(公告)号:US20250117443A1

    公开(公告)日:2025-04-10

    申请号:US18482975

    申请日:2023-10-09

    Abstract: A computer-implemented method for performing data difference evaluation is provided. Aspects include obtaining a first data set and a second data set, creating a first plurality of feature vectors by inputting the first data set into each of a plurality of models, and creating a second plurality of feature vectors by inputting the second data set into each of the plurality of models. Aspects also include identifying a mapping between elements of the first plurality of vectors and elements the second plurality of feature vectors created by a same model of the plurality of models, calculating, for each of the plurality of models based at least in part on the mapping, a model distance between the first data set and the second data set, and calculating, based at least in part on the model distances, an ensemble distance between first data set and the second data set.

    INCREMENTAL MACHINE LEARNING FOR A PARAMETRIC MACHINE LEARNING MODEL

    公开(公告)号:US20230137184A1

    公开(公告)日:2023-05-04

    申请号:US17453540

    申请日:2021-11-04

    Abstract: A method, system, and computer program product for incremental machine learning for a parametric machine learning model are disclosed. The method may include processing samples comprising historical samples and new samples with an existing parametric machine learning model to obtain at least one prediction residual of each of the samples, wherein the existing parametric machine learning model was trained based on the historical samples. The method may further include clustering the samples based on the at least one prediction residual of each of the samples and features of each of the samples. The method may further include sampling samples in each cluster to ensure that each cluster includes substantially similar number of sampled samples. The method may further include updating the existing parametric machine learning model to obtain an updated parametric machine learning model based on sampled samples in each cluster.

    Identifying potential problems in a pumpjack

    公开(公告)号:US11619225B2

    公开(公告)日:2023-04-04

    申请号:US17114869

    申请日:2020-12-08

    Abstract: Methods, computer program products, and/or systems are provided that perform the following operations: obtaining a series of indicator diagrams corresponding to strokes of a pumpjack over a specific time duration, dividing each indicator diagram into a plurality of location segments in a direction of location of the rod; obtaining load difference features between upstroke loads and corresponding downstroke loads in the plurality of location segments; identifying a location segment with an abnormal load difference feature based on a time series data of load difference feature corresponding to one of the plurality of location segments, the time series data of load difference feature including a series of data points of load difference feature of the one of the plurality of location segments in time order; and providing an indication of a potential problem based, at least in part, on the identification of the location segment with an abnormal load difference feature.

    Feature Generation for Training Data Sets Based on Unlabeled Data

    公开(公告)号:US20230073137A1

    公开(公告)日:2023-03-09

    申请号:US17447258

    申请日:2021-09-09

    Abstract: A computer implemented method for machine learning model training. A number of processor units creates a cluster model comprising labeled samples and unlabeled samples. The number of processor units identifies cluster information for the labeled samples from the cluster model. The number of processor units adds a set of new features to a set of original features for the labeled samples using the cluster information to form an extended set of features for the labeled samples, wherein the labeled samples with the set of original features and the set of new features form a training data set for training a machine learning model.

    ARTIFICIAL INTELLIGENCE MODEL GENERATION USING DATA WITH DESIRED DIAGNOSTIC CONTENT

    公开(公告)号:US20220101044A1

    公开(公告)日:2022-03-31

    申请号:US17035816

    申请日:2020-09-29

    Abstract: A computer receives a general predictive model and training data. The computer builds a clustering feature tree model to condense the training data into data groups. The computer applies a leave-one-out evaluation method to determine an impact value for each data groups with regard to said general predictive model. The computer identifies a diagnostic category for each data group selected from a list of categories including model-harmful data, model-neutral data, and model-helping data, in accordance with said impact value. The computer removes data in groups labelled as model-harmful from the training data and builds a modified general predictive model based on data in groups labelled as model-neutral or model-helping.

    Optimization of time-series anomaly detection

    公开(公告)号:US12298990B1

    公开(公告)日:2025-05-13

    申请号:US18524131

    申请日:2023-11-30

    Abstract: An approach to time-series data point anomaly detection may be presented. Data point anomalies in time-series data can cause a cascade of incorrect predictions in a time-series data prediction model. Presented herein may be an approach to decompose a time-series training data set into elementary components, such as seasonal, trend and residual. The approach may determine one or more confidence intervals for elementary components of data points including level shift, variance, and outlier. From these confidence intervals, new data points can be analyzed and identified as anomaly data points. The approach may also prevent anomaly data points from being incorporated into a time series data prediction model, reducing prediction error in the prediction model.

Patent Agency Ranking