-
公开(公告)号:US20220358432A1
公开(公告)日:2022-11-10
申请号:US17316058
申请日:2021-05-10
Applicant: SAP SE
Inventor: Francesco Alda , Amrit Raj , Sergey Smirnov , Evgeny Arnautov
Abstract: Technologies are described for identifying features that can be used to predict missing attribute values. For example, a set of structured data can be received comprising a plurality of features and one or more labels. The set of structured data can be pre-processed, comprise applying one or more cleaning policies to produce a set of pre-processed features. The set of pre-processed features can be filtered using correlation-based filtering that uses one or more correlation estimation techniques to remove at least some highly correlated features. The correlation-based filtering can produce a set of filtered features. Feature subset selection can be performed comprising applying machine learning algorithms to the set of filtered features to determine relative importance among the set of filtered features. Based on the relative importance, a subset of the set of filtered features can be determined. The subset of the set of filtered features can be output.
-
公开(公告)号:US20230325776A1
公开(公告)日:2023-10-12
申请号:US17716368
申请日:2022-04-08
Applicant: SAP SE
Inventor: Francesco Alda , Andrea Bruera , Francesco Di Cerbo
CPC classification number: G06Q10/1053 , G06N7/005 , G06N20/00
Abstract: In an example embodiment, a machine learning-based solution for generating synthetic CVs that preserve the statistical properties of the original corpus is provided, while providing strong privacy guarantees. As synthetic data do not refer to any natural person and can be generated from anonymized data, they are not subject to data protection regulations.
-
公开(公告)号:US11983652B2
公开(公告)日:2024-05-14
申请号:US17316058
申请日:2021-05-10
Applicant: SAP SE
Inventor: Francesco Alda , Amrit Raj , Sergey Smirnov , Evgeny Arnautov
IPC: G06Q10/00 , G06N5/022 , G06Q10/0631
CPC classification number: G06Q10/06313 , G06N5/022
Abstract: Technologies are described for identifying features that can be used to predict missing attribute values. For example, a set of structured data can be received comprising a plurality of features and one or more labels. The set of structured data can be pre-processed, comprise applying one or more cleaning policies to produce a set of pre-processed features. The set of pre-processed features can be filtered using correlation-based filtering that uses one or more correlation estimation techniques to remove at least some highly correlated features. The correlation-based filtering can produce a set of filtered features. Feature subset selection can be performed comprising applying machine learning algorithms to the set of filtered features to determine relative importance among the set of filtered features. Based on the relative importance, a subset of the set of filtered features can be determined. The subset of the set of filtered features can be output.
-
公开(公告)号:US11836612B2
公开(公告)日:2023-12-05
申请号:US16444222
申请日:2019-06-18
Applicant: SAP SE
Inventor: Francesco Alda , Evgeny Arnautov , Amrit Raj , Sergey Smirnov , Ekaterina Sutter
CPC classification number: G06N3/08 , G06F16/258 , G06F16/285
Abstract: Disclosed herein are system, method, and computer program product embodiments for classifying data objects using machine learning. In an embodiment, an artificial neural network may be trained to identify explained variable values corresponding to data object attributes. For example, the explained variables may be a category and a subcategory with the subcategory having a hierarchical relationship to the category. The artificial neural network may then receive a data record having one or more attribute values. The neural network may then identify a first and second explained variable value corresponding to the one or more attribute values based on the trained neural network model. The first and second explained variable values may then be associated with the data record. For example, if the data record is stored in a database, the record may be updated to include the first and second explained variable values.
-
-
-