Composite relationship discovery framework

    公开(公告)号:US11693879B2

    公开(公告)日:2023-07-04

    申请号:US17324667

    申请日:2021-05-19

    IPC分类号: G06F16/26 G06F16/2458

    CPC分类号: G06F16/26 G06F16/2465

    摘要: Systems and methods include reception of a set of data including continuous features and a discrete feature, each continuous feature associated with a plurality of values and the discrete feature associated with a plurality of discrete values, determine, for each continuous feature, a relationship factor representing a relationship between the discrete feature and the continuous feature based on the plurality of values associated with the continuous feature and the plurality of discrete values, identify one of the continuous features associated with a largest one of the determined relationship factors, generate, for each of the other features, a correlation factor representing a correlation between the continuous feature and the identified continuous feature, determine, for each of the continuous features other than the identified continuous feature, a composite relationship score based on the relationship factor and the correlation factor associated with the feature, and present a visualization associated with the discrete feature, the identified continuous feature, and a continuous feature associated with a largest composite relationship score.

    FEATURE CONTRIBUTION SCORE CLASSIFICATION
    2.
    发明公开

    公开(公告)号:US20240062101A1

    公开(公告)日:2024-02-22

    申请号:US17890073

    申请日:2022-08-17

    发明人: Paul O'Hara

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: A historical feature contribution score dataset comprising a number of sets of scores generated by machine learning model may be obtained. Additional feature contribution score sets may be materialized such that the size of each additional feature contribution score set is based on a corresponding randomly selected values within a set-size range. A training dataset may be produced that includes feature contribution scores and corresponding classification labels extracted from the historical feature contribution score dataset and the additional feature contribution score sets. The classification labels may indicate an amount that the corresponding feature contribution scores contribute to a prediction of a target feature. A machine learning model may be trained to predict the classification labels using the training dataset. An input feature contribution score set may be applied to the machine learning model to obtain predicted classification labels.

    Histogram Bin Interval Approximation
    3.
    发明公开

    公开(公告)号:US20240020896A1

    公开(公告)日:2024-01-18

    申请号:US18351288

    申请日:2023-07-12

    IPC分类号: G06T11/20 G06F18/2431

    CPC分类号: G06T11/206 G06F18/2431

    摘要: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.

    Automatic frequency recommendation for time series data

    公开(公告)号:US11321332B2

    公开(公告)日:2022-05-03

    申请号:US16876463

    申请日:2020-05-18

    摘要: The present disclosure involves systems, software, and computer implemented methods for automatically recommending one or more frequencies for time series data. One example method includes receiving a request for an insight analysis for an input time series included in a dataset. For each of multiple frequencies to analyze, the input time series is transformed into a frequency time series. An absolute percentage change impact factor and an absolute trend impact factor are determined for each frequency time series. A frequency interest score is determined based on the determined absolute percentage change factors and the determined absolute trend impact factors, for each time frequency time series. The frequency interest score is provided for at least some of the frequency time series.

    CATEGORY CLASSIFICATION SYSTEM FOR FEATURE CONTRIBUTION SCORES

    公开(公告)号:US20240193462A1

    公开(公告)日:2024-06-13

    申请号:US18059852

    申请日:2022-11-29

    发明人: Paul O'Hara

    IPC分类号: G06N20/00 G06K9/62

    CPC分类号: G06N20/00 G06K9/6256

    摘要: A system may obtain a plurality of historical feature contribution score (FCS) datasets, each historical FCS dataset comprising a first plurality of feature contribution scores and a size of the historical FCS dataset. The system may apply default feature contribution category classification (FCCC) parameters to the plurality of historical FCS datasets and may optimize the default FCCC parameters to produce a plurality of optimized FCCC parameters. The system may produce a training dataset comprising the optimized FCCC parameters and use the training dataset to train a machine learning model to apply the category classification labels. The system may apply the new FCS dataset to the machine learning model, the new FCS dataset comprising a second plurality of feature contribution scores and a size of the new FCS dataset, and provide the category classification labels for the new FCS dataset to a user interface.

    Continuous feature-independent determination of features for deviation analysis

    公开(公告)号:US11720579B2

    公开(公告)日:2023-08-08

    申请号:US17367882

    申请日:2021-07-06

    摘要: Systems and methods include determination, for each of a plurality of discrete features, of statistics based on a number of occurrences of each discrete value of the discrete feature in the data, determination of first summary statistics based on the determined statistics, determine of a dissimilarity for each discrete feature based on the first summary statistics and on the statistics determined for the discrete feature, determination of candidate discrete features based on the determined dissimilarities, determination, for each of the candidate discrete features, of second summary statistics based on values of a continuous feature associated with each discrete value of the candidate discrete feature, determination of a deviation score for each of the candidate discrete features based on the second summary statistics, and transmission of the candidate discrete features for display in association with the continuous feature based on the determined deviation scores.

    AUTOMATIC HOT AREA DETECTION IN HEAT MAP VISUALIZATIONS

    公开(公告)号:US20210349911A1

    公开(公告)日:2021-11-11

    申请号:US16867036

    申请日:2020-05-05

    IPC分类号: G06F16/25 G06F7/14 G06F16/28

    摘要: The present disclosure involves systems, software, and computer implemented methods for automatically detecting hot areas in heat map visualizations. One example method includes identifying a two-dimensional heat map. The identified two-dimensional heat map is converted to a one-dimensional heat map. Cells of the one-dimensional heat map are clustered using a density-based clustering algorithm to generate at least one dense region of cells. A mean value of cells in each dense region is calculated and the dense regions are sorted by mean value in descending order. An approach for identifying hot areas is selected and the selected approach is used to identify at least one dense region as a hot area of the one-dimensional heat map.

    Histogram bin interval approximation

    公开(公告)号:US11734864B2

    公开(公告)日:2023-08-22

    申请号:US17514801

    申请日:2021-10-29

    IPC分类号: G06T11/20 G06F18/2431

    CPC分类号: G06T11/206 G06F18/2431

    摘要: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.

    Determination of candidate features for deviation analysis

    公开(公告)号:US11681715B2

    公开(公告)日:2023-06-20

    申请号:US17342812

    申请日:2021-06-09

    IPC分类号: G06F16/2458 G06F16/28

    摘要: Systems and methods include determination, determine, for each of a plurality of discrete features, of statistics for each discrete value of the discrete feature based on values of a continuous feature associated with the discrete value, determination, for each discrete feature, of first summary statistics based on the statistics determined for each discrete value of the discrete feature, determination, for each discrete feature, of a dissimilarity based on the first summary statistics determined for the discrete feature and on the statistics determined for each discrete value of the discrete feature, determination of candidate discrete features of the discrete features based on the determined dissimilarities, the candidate discrete features comprising less than all of the discrete features, determination, for each of the candidate discrete features, of second summary statistics based on values of the continuous feature associated with each discrete value of the candidate discrete feature, determine of a deviation score for each of the candidate discrete features based on the second summary statistics, and presentation of the candidate discrete features based on the determined deviation scores.