Training gradient boosted decision trees with progressive maximum depth for parsimony and interpretability

    公开(公告)号:US10977737B2

    公开(公告)日:2021-04-13

    申请号:US16147154

    申请日:2018-09-28

    发明人: Brian Ironside

    IPC分类号: G06Q40/08 G06N20/00 G06K9/62

    摘要: An apparatus is provided for generating a generalized linear model structure definition by generating a gradient boosted tree model and separating each decision tree into a plurality of indicator variables upon which a dependent variable of the generalized linear model depends. A first number of plurality of decision tree structures each having a maximum tree depth of one (1) is formed, where the first number represents a number of decision tree structures necessary to exhaust all main effects of a plurality of predictor variables on a dependent variable. Successive pluralities of decision tree structures each having a maximum tree depth increased by one (1) as compared to its immediately preceding plurality of decision tree structures are iteratively formed. Each successive plurality of decision tree structures comprises a second number of decision tree structures necessary to exhaust all interactions between the plurality of predictor variables.

    Predictor neutralization in predictive data analysis systems

    公开(公告)号:US11954603B1

    公开(公告)日:2024-04-09

    申请号:US16850630

    申请日:2020-04-16

    IPC分类号: G06N5/02 G06N20/00

    CPC分类号: G06N5/02 G06N20/00

    摘要: There is a need for more effective and efficient predictive data analysis. Various embodiments of the present invention address one or more of the noted technical challenges. In one example, a method for generating a neutralized prediction model includes accessing an initial prediction model generated using an initial training data object, performing a randomized shuffling of the initial training data object to generate a shuffled training data object, generating randomized predictions by processing the shuffled training data object using the initial prediction model, performing a neutralization of the initial training data object to generate a neutralized training data object, and generating the neutralized prediction model based at least in part on the neutralized training data object and the randomized predictions.