Invention Publication
- Patent Title: PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS
-
Application No.: US18412703Application Date: 2024-01-15
-
Publication No.: US20240160652A1Publication Date: 2024-05-16
- Inventor: Arunav Mishra , Henning Schwabe , Lalita Shaki Uribe Ordonez
- Applicant: BASF SE
- Applicant Address: DE Ludwigshafen
- Assignee: BASF SE
- Current Assignee: BASF SE
- Current Assignee Address: DE Ludwigshafen
- Priority: EP 190061.0 2020.08.07
- The original application number of the division: US17394994 2021.08.05
- Main IPC: G06F16/35
- IPC: G06F16/35 ; G06F16/338 ; G06N20/00

Abstract:
The present invention relates to information retrieval. In order to facilitate a search and identification of documents, there is provided a computer-implemented method for training a classifier model for data classification in response to a search query. The computer-implemented method comprises:
a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
b) training the classifier model by using the training dataset to fit parameters of the classifier model;
c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached.
a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
b) training the classifier model by using the training dataset to fit parameters of the classifier model;
c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached.
Information query