Integrating Data Quality Analyses For Modeling Metrics

    公开(公告)号:US20250005456A1

    公开(公告)日:2025-01-02

    申请号:US18766438

    申请日:2024-07-08

    Abstract: Techniques for generating a composite score for data quality are disclosed. Univariate analysis is performed on a plurality of data points corresponding to each of a first feature, a second feature, and a third feature of a data set. The univariate analysis includes at least a first type of analysis generating a first score having a first range of possible values, and a second type of analysis generating a second score having a second range of possible values. A first quality score is computed for the data values for the first, second, and third features based on a normalized first score and a normalized second score. Machine learning is performed on the data points corresponding to one or both of the first feature and the second feature having a first quality score above a threshold value to model the third feature.

Patent Agency Ranking