-
公开(公告)号:US20210365471A1
公开(公告)日:2021-11-25
申请号:US16877909
申请日:2020-05-19
Applicant: BUSINESS OBJECTS SOFTWARE LTD.
Inventor: Paul O'Hara , Robert McGrath , Ying Wu , Shekhar Chhabra , Eoin Goslin , Pat Connaughton , John Bowden , Alan Maher , David Hutchinson , Leanne Long , Malte Christian Kaufmann , Pukhraj Saxena , Priti Mulchandani , Anirban Banerjee
IPC: G06F16/26 , G06F16/28 , G06F16/2458
Abstract: The present disclosure involves systems, software, and computer implemented methods for generating insights based on numeric and categorical data. One example method includes receiving a request for an insight analysis for a dataset that includes at least one continuous feature and at least one categorical feature. Continuous features can have any value within a range of numerical values and categorical features are enumerated features that can have a value from a predefined set of values. A selection of a first continuous feature for analysis is received, and at least one categorical feature is identified for analysis. A deviation factor and a relationship factor are determined for each identified categorical feature. An insight score is determined for each identified categorical feature that combines the deviation factor and the relationship factor for the categorical feature. The insight score is provided for at least some of the identified categorical features.
-
公开(公告)号:US12205203B2
公开(公告)日:2025-01-21
申请号:US18351288
申请日:2023-07-12
Applicant: Business Objects Software Ltd
Inventor: Paul O'Hara , Malte Christian Kaufmann , Esther Rodrigo Ortiz , Conor White
IPC: G06T11/20 , G06F18/2431
Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.
-
公开(公告)号:US20250013668A1
公开(公告)日:2025-01-09
申请号:US18754570
申请日:2024-06-26
Applicant: Business Objects Software Ltd.
Inventor: Paul O'Hara , Ying Wu , Malte Christian Kaufmann
Abstract: Anomalies may be detected using a multiple machine learning model anomaly detection framework. A clustering model is trained using an unsupervised machine learning algorithm on a historical anomaly dataset. A plurality of clusters of records are determined by applying the historical anomaly dataset to the clustering model. Then it is determined whether each cluster of the plurality of clusters is an anomaly-type cluster or a normal-type cluster. The plurality of labels for the plurality of records are updated based on the particular record's cluster classification. Non-pure clusters are determined from among the plurality of clusters based on a purity threshold. A supervised machine learning model is trained for each of the non-pure clusters using the records in the given cluster and the labels for each of those records. Then, predictions of an anomaly are made using the clustering model and the supervised machine learning models.
-
公开(公告)号:US20240020896A1
公开(公告)日:2024-01-18
申请号:US18351288
申请日:2023-07-12
Applicant: Business Objects Software Ltd
Inventor: Paul O'Hara , Malte Christian Kaufmann , Esther Rodrigo Ortiz , Conor White
IPC: G06T11/20 , G06F18/2431
CPC classification number: G06T11/206 , G06F18/2431
Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.
-
公开(公告)号:US20170364818A1
公开(公告)日:2017-12-21
申请号:US15185951
申请日:2016-06-17
Applicant: Business Objects Software Ltd.
Inventor: Ying Wu , Malte Christian Kaufmann , Robert McGrath , Ulrich Schlueter , Simon Sitt
Abstract: For a plurality of sensors, a particular sensor is indicated as a target sensor and the other sensors as input sensors. A regression model is trained using historical data from the plurality of related sensors. The trained regression model is applied to the target sensor to generate a predicted target sensor value. A difference between an actual target sensor value and the predicted target sensor value is calculated. A probability of difference for the calculated difference between the actual target sensor value and the predicted target sensor value is compared against a threshold value.
-
公开(公告)号:US11734864B2
公开(公告)日:2023-08-22
申请号:US17514801
申请日:2021-10-29
Applicant: Business Objects Software Ltd.
Inventor: Paul O'Hara , Malte Christian Kaufmann , Esther Rodrigo Ortiz , Conor White
IPC: G06T11/20 , G06F18/2431
CPC classification number: G06T11/206 , G06F18/2431
Abstract: Using approximated bin intervals to label the histograms provides clarity and allows for the histogram to be more intuitively understood. A dataset may comprise a plurality of records having a plurality of features including one or more continuous features. A selection of a continuous feature may be obtained. A bin width based on a number of bins and feature statistics of the continuous feature may be determined. An approximated bin interval range is determined by applying a bin mask based on the bin width to the feature statistics. An approximated bin width is determined based on the number of bins and the approximated bin interval range. Approximated bin intervals for the histogram are determined based on the approximated bin width. A histogram is generated having bins with intervals based the approximated bin intervals.
-
公开(公告)号:US11681715B2
公开(公告)日:2023-06-20
申请号:US17342812
申请日:2021-06-09
Applicant: BUSINESS OBJECTS SOFTWARE LTD.
Inventor: Paul O'Hara , Malte Christian Kaufmann , Anirban Banerjee , Ian Denver , Alan McShane
IPC: G06F16/2458 , G06F16/28
CPC classification number: G06F16/2462 , G06F16/2465 , G06F16/285
Abstract: Systems and methods include determination, determine, for each of a plurality of discrete features, of statistics for each discrete value of the discrete feature based on values of a continuous feature associated with the discrete value, determination, for each discrete feature, of first summary statistics based on the statistics determined for each discrete value of the discrete feature, determination, for each discrete feature, of a dissimilarity based on the first summary statistics determined for the discrete feature and on the statistics determined for each discrete value of the discrete feature, determination of candidate discrete features of the discrete features based on the determined dissimilarities, the candidate discrete features comprising less than all of the discrete features, determination, for each of the candidate discrete features, of second summary statistics based on values of the continuous feature associated with each discrete value of the candidate discrete feature, determine of a deviation score for each of the candidate discrete features based on the second summary statistics, and presentation of the candidate discrete features based on the determined deviation scores.
-
公开(公告)号:US11675765B2
公开(公告)日:2023-06-13
申请号:US17329519
申请日:2021-05-25
Applicant: BUSINESS OBJECTS SOFTWARE LTD.
Inventor: Ying Wu , Malte Christian Kaufmann , Alan McShane , Anirban Banerjee , Gareth Maguire
IPC: G06F16/22 , G06F18/2113 , G06F18/2321 , G06F18/23213
CPC classification number: G06F16/2237 , G06F16/2264 , G06F18/2113 , G06F18/2321 , G06F18/23213
Abstract: A system and method including determining, for a specified target measure column of a first dataset including a plurality of records, the metadata of the first dataset, including a probability distribution for the specified target column and dimension scores for the dimensions for the first dataset conditioned on the specified target measure column, where the first dataset comprises a plurality of columns including the at least one target measure column and a plurality of non-numeric, dimension columns for the records of the first dataset; determining, for a subset of data of the first dataset based on one or more specified variables, dimension scores for the dimensions of the subset of data approximately derived from the determined metadata of the first dataset; and providing recommendations of top contributors based on the approximated dimension scores of dimensions of the subset of data.
-
公开(公告)号:US20220382906A1
公开(公告)日:2022-12-01
申请号:US17330997
申请日:2021-05-26
Applicant: BUSINESS OBJECTS SOFTWARE LTD.
Inventor: Ying Wu , Malte Christian Kaufmann
Abstract: A system and method including receiving numeric data of a first dataset including a plurality of columns having numeric values with one of the plurality of columns specified as a target column; generating a trained generative model based on numeric values in non-target columns of the plurality of columns; generating a trained predictive model based on numeric values in non-target columns of the plurality of columns being input variables and the target column being a target variable; generating, by the trained generative model, a new set of numeric data for the non-target columns; generating predicted target values for the non-target columns by the trained predictive model using the new set of numeric data as an input to the predictive model; and generating anonymized numeric data for the first dataset by combining the new set of numeric data and the target column populated with the generated predicted target values.
-
公开(公告)号:US20220382729A1
公开(公告)日:2022-12-01
申请号:US17329519
申请日:2021-05-25
Applicant: BUSINESS OBJECTS SOFTWARE LTD.
Inventor: Ying Wu , Malte Christian Kaufmann , Alan McShane , Anirban Banerjee , Gareth Maguire
Abstract: A system and method including determining, for a specified target measure column of a first dataset including a plurality of records, the metadata of the first dataset, including a probability distribution for the specified target column and dimension scores for the dimensions for the first dataset conditioned on the specified target measure column, where the first dataset comprises a plurality of columns including the at least one target measure column and a plurality of non-numeric, dimension columns for the records of the first dataset; determining, for a subset of data of the first dataset based on one or more specified variables, dimension scores for the dimensions of the subset of data approximately derived from the determined metadata of the first dataset; and providing recommendations of top contributors based on the approximated dimension scores of dimensions of the subset of data.
-
-
-
-
-
-
-
-
-