-
公开(公告)号:US12045734B2
公开(公告)日:2024-07-23
申请号:US18117923
申请日:2023-03-06
Applicant: SAP SE
Inventor: Jacques Doan Huu
CPC classification number: G06N5/02 , G06F16/2228 , G06F16/2365
Abstract: Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature.
-
公开(公告)号:US11521089B2
公开(公告)日:2022-12-06
申请号:US16204000
申请日:2018-11-29
Applicant: SAP SE
Inventor: Scott Kumar Cameron , Olivier Hamon , Gabriel Kevorkian , Eric Gouthiere , Jacques Doan Huu
Abstract: A predictive model pipeline data store may contain electronic records defining a predictive model pipeline composed of operation nodes. Based on the information in the data store, an execution framework platform may calculate a hash value for each operation node by including all recursive dependencies using ancestor node hash values and current node parameters. The platform may then compare each computed hash value with a previously computed hash value associated with a prior execution of a prior version of the pipeline. Operation nodes that have an unchanged hash value may be tagged “idle.” Operation nodes that have a changed hash value may be tagged “train and apply” or “apply” based on current node parameters (and an “apply” tag may propagate backwards through the pipeline to ancestor nodes). The platform may then ignore the operation nodes tagged “idle” when creating a physical execution plan to be provided to a target platform.
-
公开(公告)号:US20210334667A1
公开(公告)日:2021-10-28
申请号:US16858143
申请日:2020-04-24
Applicant: SAP SE
Inventor: Jacques Doan Huu
Abstract: Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature.
-
公开(公告)号:US11620537B2
公开(公告)日:2023-04-04
申请号:US16858143
申请日:2020-04-24
Applicant: SAP SE
Inventor: Jacques Doan Huu
Abstract: Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature.
-
公开(公告)号:US20230026391A1
公开(公告)日:2023-01-26
申请号:US17956120
申请日:2022-09-29
Applicant: SAP SE
Inventor: Jacques Doan Huu
IPC: G06N20/00
Abstract: Features are used to train one or more ML models in a modelling layer. In a feature selection layer, each generated ML model is analyzed to determine, for each input feature, a degree of importance of the feature on the results generated by the ML model. Features with low importance are identified and the information is propagated backward to the data source and feature engineering layers. In response, the data source and feature engineering layers refrain from gathering or generating the unimportant features. Based on a confidence measure of the determination that each feature is important or unimportant, a number of periods between reevaluation of the feature importance is determined. After the number of periods has elapsed, a removed feature is restored to the pipeline.
-
公开(公告)号:US11561940B2
公开(公告)日:2023-01-24
申请号:US16947563
申请日:2020-08-06
Applicant: SAP SE
Inventor: Jacques Doan Huu
Abstract: Disclosed herein are system, method, and computer program product embodiments for generating a bridge between analytical models. In an embodiment, a server can extract a first variable dependency schema from a first model (e.g., predictive model or business intelligence report) and a second variable schema from a second model (e.g., predictive model or business intelligence report). The first variable dependency schema includes a first definition of a relationship between a first variable and a second variable. The server can compare the first variable dependency schema and the second variable dependency schema. Furthermore, the server can generate a modification to be made in the second variable dependency schema based on the first definition of the relationship between the first and second variable and outputs the modification to be made to the second variable dependency schema.
-
公开(公告)号:US20240202579A1
公开(公告)日:2024-06-20
申请号:US18083048
申请日:2022-12-16
Applicant: SAP SE
Inventor: Jacques Doan Huu
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: The present disclosure relates to computer-implemented methods, software, and systems for identifying data patterns based on data observations collected as time series data. A cross-validation assessment of a plurality of predictive models is performed. Based on the cross-validation assessment, a respective deviation risk is determined. The respective deviation risk is determined based on comparing forecasting variability distribution for a validation data set during the cross-validation assessment with forecasting variability distribution for test values from a test data set. The test data set represents forecasted values generated based on a respective predictive model for a future horizon. A predictive model can be excluded based on evaluating deviation risks of each of the predictive models. A model selection of a candidate model from the set of candidate predictive models is performed. The candidate model is selected based on evaluation of accuracy of the set of candidate predictive model according to the cross-validation assessment.
-
公开(公告)号:US11823073B2
公开(公告)日:2023-11-21
申请号:US16190518
申请日:2018-11-14
Applicant: SAP SE
Inventor: Jacques Doan Huu
Abstract: Provided are systems and methods for auto-completing debriefing processing for a machine learning model pipeline based on a type of predictive algorithm. In one example, the method may include one or more of building a machine learning model pipeline via a user interface, detecting, via the user interface, a selection associated with a predictive algorithm included within the machine learning model pipeline, in response to the selection, identifying debriefing components for the predictive algorithm based on a type of the predictive algorithm from among a plurality of types of predictive algorithms, and automatically incorporating processing for the debriefing components within the machine learning model pipeline such that values of the debriefing components are generated during training of the predictive algorithm within the machine learning model pipeline.
-
公开(公告)号:US20230206083A1
公开(公告)日:2023-06-29
申请号:US18117923
申请日:2023-03-06
Applicant: SAP SE
Inventor: Jacques Doan Huu
CPC classification number: G06N5/02 , G06F16/2228 , G06F16/2365
Abstract: Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature.
-
公开(公告)号:US11494699B2
公开(公告)日:2022-11-08
申请号:US16868145
申请日:2020-05-06
Applicant: SAP SE
Inventor: Jacques Doan Huu
Abstract: Features are used to train one or more ML models in a modelling layer. In a feature selection layer, each generated ML model is analyzed to determine, for each input feature, a degree of importance of the feature on the results generated by the ML model. Features with low importance are identified and the information is propagated backward to the data source and feature engineering layers. In response, the data source and feature engineering layers refrain from gathering or generating the unimportant features. Based on a confidence measure of the determination that each feature is important or unimportant, a number of periods between reevaluation of the feature importance is determined. After the number of periods has elapsed, a removed feature is restored to the pipeline.
-
-
-
-
-
-
-
-
-