Histogram bin interval approximation

    公开(公告)号:US12205203B2

    公开(公告)日:2025-01-21

    申请号:US18351288

    申请日:2023-07-12

    Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.

    MULTIPLE MACHINE LEARNING MODEL ANOMALY DETECTION FRAMEWORK

    公开(公告)号:US20250013668A1

    公开(公告)日:2025-01-09

    申请号:US18754570

    申请日:2024-06-26

    Abstract: Anomalies may be detected using a multiple machine learning model anomaly detection framework. A clustering model is trained using an unsupervised machine learning algorithm on a historical anomaly dataset. A plurality of clusters of records are determined by applying the historical anomaly dataset to the clustering model. Then it is determined whether each cluster of the plurality of clusters is an anomaly-type cluster or a normal-type cluster. The plurality of labels for the plurality of records are updated based on the particular record's cluster classification. Non-pure clusters are determined from among the plurality of clusters based on a purity threshold. A supervised machine learning model is trained for each of the non-pure clusters using the records in the given cluster and the labels for each of those records. Then, predictions of an anomaly are made using the clustering model and the supervised machine learning models.

    Histogram Bin Interval Approximation
    14.
    发明公开

    公开(公告)号:US20240020896A1

    公开(公告)日:2024-01-18

    申请号:US18351288

    申请日:2023-07-12

    CPC classification number: G06T11/206 G06F18/2431

    Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.

    Histogram bin interval approximation

    公开(公告)号:US11734864B2

    公开(公告)日:2023-08-22

    申请号:US17514801

    申请日:2021-10-29

    CPC classification number: G06T11/206 G06F18/2431

    Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.

    Determination of candidate features for deviation analysis

    公开(公告)号:US11681715B2

    公开(公告)日:2023-06-20

    申请号:US17342812

    申请日:2021-06-09

    CPC classification number: G06F16/2462 G06F16/2465 G06F16/285

    Abstract: Systems and methods include determination, determine, for each of a plurality of discrete features, of statistics for each discrete value of the discrete feature based on values of a continuous feature associated with the discrete value, determination, for each discrete feature, of first summary statistics based on the statistics determined for each discrete value of the discrete feature, determination, for each discrete feature, of a dissimilarity based on the first summary statistics determined for the discrete feature and on the statistics determined for each discrete value of the discrete feature, determination of candidate discrete features of the discrete features based on the determined dissimilarities, the candidate discrete features comprising less than all of the discrete features, determination, for each of the candidate discrete features, of second summary statistics based on values of the continuous feature associated with each discrete value of the candidate discrete feature, determine of a deviation score for each of the candidate discrete features based on the second summary statistics, and presentation of the candidate discrete features based on the determined deviation scores.

    DATA ANONYMIZATION FOR CLOUD ANALYTICS

    公开(公告)号:US20220382906A1

    公开(公告)日:2022-12-01

    申请号:US17330997

    申请日:2021-05-26

    Abstract: A system and method including receiving numeric data of a first dataset including a plurality of columns having numeric values with one of the plurality of columns specified as a target column; generating a trained generative model based on numeric values in non-target columns of the plurality of columns; generating a trained predictive model based on numeric values in non-target columns of the plurality of columns being input variables and the target column being a target variable; generating, by the trained generative model, a new set of numeric data for the non-target columns; generating predicted target values for the non-target columns by the trained predictive model using the new set of numeric data as an input to the predictive model; and generating anonymized numeric data for the first dataset by combining the new set of numeric data and the target column populated with the generated predicted target values.

    TOP CONTRIBUTOR RECOMMENDATION FOR CLOUD ANALYTICS

    公开(公告)号:US20220382729A1

    公开(公告)日:2022-12-01

    申请号:US17329519

    申请日:2021-05-25

    Abstract: A system and method including determining, for a specified target measure column of a first dataset including a plurality of records, the metadata of the first dataset, including a probability distribution for the specified target column and dimension scores for the dimensions for the first dataset conditioned on the specified target measure column, where the first dataset comprises a plurality of columns including the at least one target measure column and a plurality of non-numeric, dimension columns for the records of the first dataset; determining, for a subset of data of the first dataset based on one or more specified variables, dimension scores for the dimensions of the subset of data approximately derived from the determined metadata of the first dataset; and providing recommendations of top contributors based on the approximated dimension scores of dimensions of the subset of data.

Patent Agency Ranking