-
公开(公告)号:US20250005456A1
公开(公告)日:2025-01-02
申请号:US18766438
申请日:2024-07-08
Applicant: Oracle International Corporation
Inventor: Amit Vaid , Vijayalakshmi Krishnamurthy
IPC: G06N20/00 , G06F18/21 , G06F18/2113
Abstract: Techniques for generating a composite score for data quality are disclosed. Univariate analysis is performed on a plurality of data points corresponding to each of a first feature, a second feature, and a third feature of a data set. The univariate analysis includes at least a first type of analysis generating a first score having a first range of possible values, and a second type of analysis generating a second score having a second range of possible values. A first quality score is computed for the data values for the first, second, and third features based on a normalized first score and a normalized second score. Machine learning is performed on the data points corresponding to one or both of the first feature and the second feature having a first quality score above a threshold value to model the third feature.
-
22.
公开(公告)号:US11568179B2
公开(公告)日:2023-01-31
申请号:US16438969
申请日:2019-06-12
Applicant: Oracle International Corporation
Inventor: Joseph Marc Posner , Sunil Kumar Kunisetty , Mohan Kamath , Nickolas Kavantzas , Sachin Bhatkar , Sergey Troshin , Sujay Sarkhel , Shivakumar Subramanian Govindarajapuram , Vijayalakshmi Krishnamurthy
IPC: G06N20/00 , G06F16/00 , G06K9/62 , G06F16/9537 , G06F16/957 , G06F16/58 , G06N5/04 , G06N5/02
Abstract: A model analyzer may receive a representative data set as input and select one of a plurality of analytic models to perform the analysis. Before deciding which model to use the model may be trained, and the trained model evaluated for accuracy. However, some models are known to behave poorly when the training data is distributed in a particular way. Thus, the cost of training a model and evaluating the trained model can be avoided by first analyzing the distribution of the representative data. Identifying the representative data distribution allows ruling out use of models for which the distribution of the representative data is unsuitable. Only models that may be compatible with the distribution of the representative data may be trained and evaluated for accuracy. The most accurate trained model whose accuracy meets an accuracy threshold may be selected to analyze subsequently received data related to the representative data.
-