-
公开(公告)号:US20210141801A1
公开(公告)日:2021-05-13
申请号:US17021770
申请日:2020-09-15
Applicant: Nicholas John Teague
Inventor: Nicholas John Teague
IPC: G06F16/25 , G06N20/00 , G06F16/22 , G06F40/253
Abstract: A technique for automated preparation of tabular data categoric feature set encodings for machine learning, including options for variations on categoric encodings for bounded and unbounded categoric sets. String parsing may be performed to extract grammatical structure shared between the entries in a categoric feature set, such as string character subset overlaps, which may be returned in one or more columns of overlap activations or may be used to consolidate entries with shared overlaps. Numeric substring partitions may be extracted. Search terms may be applied to identify entries containing specific substring partitions. Sets of transformations may be aggregated by use of transformation primitives such as to return encodings in multiple configurations of varying information content. Additional data sets may be consistently prepared to training data sets based on properties of training data saved in a returned metadata database such as for use in inference from a trained machine learning system.
-
公开(公告)号:US11861462B2
公开(公告)日:2024-01-02
申请号:US16552857
申请日:2019-08-27
Applicant: Nicholas John Teague
Inventor: Nicholas John Teague
IPC: G06N20/00 , G06F16/90 , G06F16/27 , G06F18/21 , G06F16/901 , G06F18/214 , G06F18/2135
CPC classification number: G06N20/00 , G06F16/278 , G06F16/9027 , G06F18/214 , G06F18/2135
Abstract: A technique for automated preparation of tabular data for machine learning, including options for machine learning derived infill, feature importance evaluations, and/or dimensionality reduction. Validation data sets may be consistently prepared to training data sets based on properties of the training data saved in a metadata database. Additional data sets may be consistently prepared to training data sets based on properties of the training data saved in a returned metadata database such as for use in generating predictions from the trained ML system. Returned data sets may be prepared for oversampling of labels with lower frequency occurrence. Columns of a training data set are evaluated for appropriate categories of transformations, with the composition of transformation function applications designated by a defined tree of transformation category assignments to transformation primitives. Composition of transformation trees and their associated transformation functions may optionally be custom defined by a user.
-
公开(公告)号:US20200349467A1
公开(公告)日:2020-11-05
申请号:US16552857
申请日:2019-08-27
Applicant: Nicholas John Teague
Inventor: Nicholas John Teague
IPC: G06N20/00 , G06K9/62 , G06F16/27 , G06F16/901
Abstract: A technique for automated preparation of tabular data for machine learning, including options for machine learning derived infill, feature importance evaluations, and/or dimensionality reduction. Validation data sets may be consistently prepared to training data sets based on properties of the training data saved in a metadata database. Additional data sets may be consistently prepared to training data sets based on properties of the training data saved in a returned metadata database such as for use in generating predictions from the trained ML system. Returned data sets may be prepared for oversampling of labels with lower frequency occurrence. Columns of a training data set are evaluated for appropriate categories of transformations, with the composition of transformation function applications designated by a defined tree of transformation category assignments to transformation primitives. Composition of transformation trees and their associated transformation functions may optionally be custom defined by a user.
-
-