PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS

Invention Publication

US20240160652A1 PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS 审中-公开

Please log in to see more content

Patent Title: PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS
Application No.: US18412703

Application Date: 2024-01-15
Publication No.: US20240160652A1

Publication Date: 2024-05-16
Inventor: Arunav Mishra , Henning Schwabe , Lalita Shaki Uribe Ordonez
Applicant: BASF SE
Applicant Address: DE Ludwigshafen
Assignee: BASF SE
Current Assignee: BASF SE
Current Assignee Address: DE Ludwigshafen
Priority: EP 190061.0 2020.08.07
The original application number of the division: US17394994 2021.08.05
Main IPC: G06F16/35
IPC: G06F16/35 ; G06F16/338 ; G06N20/00

PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS

Abstract:

The present invention relates to information retrieval. In order to facilitate a search and identification of documents, there is provided a computer-implemented method for training a classifier model for data classification in response to a search query. The computer-implemented method comprises:

a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
b) training the classifier model by using the training dataset to fit parameters of the classifier model;
c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/35	..••聚类；分类