Automatic feature selection and model generation for linear models

    公开(公告)号:US11699094B2

    公开(公告)日:2023-07-11

    申请号:US16177107

    申请日:2018-10-31

    CPC classification number: G06N20/00 G06F17/17 G06F18/2115 G06F18/285 G06N7/00

    Abstract: Methods, systems, and devices for automated feature selection and model generation are described. A device (e.g., a server, user device, database, etc.) may perform model generation for an underlying dataset and a specified outcome variable. The device may determine relevance measurements (e.g., stump R-squared values) for a set of identified features of the dataset and can reduce the set of features based on these relevance measurements (e.g., according to a double-box procedure). Using this reduced set of features, the device may perform a least absolute shrinkage and selection operator (LASSO) regression procedure to sort the features. The device may then determine a set of nested linear models—where each successive model of the set includes an additional feature of the sorted features—and may select a “best” linear model for model generation based on this set of models and a model quality criterion (e.g., an Akaike information criterion (AIC)).

Patent Agency Ranking