Practical supervised classification of data sets

    公开(公告)号:US11914629B2

    公开(公告)日:2024-02-27

    申请号:US17394994

    申请日:2021-08-05

    申请人: BASF SE

    摘要: The present invention relates to information retrieval. In order to facilitate a search and identification of documents, there is provided a computer-implemented method for training a classifier model for data classification in response to a search query. The computer-implemented method comprises:



    a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
    b) training the classifier model by using the training dataset to fit parameters of the classifier model;
    c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
    d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
    e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached.

    PRACTICAL SUPERVISED CLASSIFICATION OF DATA SETS

    公开(公告)号:US20240160652A1

    公开(公告)日:2024-05-16

    申请号:US18412703

    申请日:2024-01-15

    申请人: BASF SE

    摘要: The present invention relates to information retrieval. In order to facilitate a search and identification of documents, there is provided a computer-implemented method for training a classifier model for data classification in response to a search query. The computer-implemented method comprises:



    a) obtaining a dataset that comprises a seed set of labeled data representing a training dataset;
    b) training the classifier model by using the training dataset to fit parameters of the classifier model;
    c) evaluating a quality of the classifier model using a test dataset that comprises unlabeled data from the obtained dataset to generate a classifier confidence score indicative of a probability of correctness of the classifier model working on the test dataset;
    d) determining a global risk value of misclassification and a reward value based on the classifier confidence score on the test dataset;
    e) iteratively updating the parameters of the classifier model and performing steps b) to d) until the global risk value falls within a predetermined risk limit value or an expected reward value is reached.

    Semantic-Temporal Visualization of Information

    公开(公告)号:US20230385311A1

    公开(公告)日:2023-11-30

    申请号:US18030597

    申请日:2021-10-07

    申请人: BASF SE

    摘要: A computer-implemented method for generating digital information data in a subject area is proposed. The method comprises:



    providing, at a processing unit, digital information corpus data;
    extracting, via the processing unit, digital information seed data from the digital information corpus data;
    performing, via the processing unit, a search in at least one database comprising knowledge information, thereby extracting a plurality of text blocks related to the subject area from the at least one database; wherein the search is performed based upon the digital information seed data,
    indexing, via the processing unit, the text blocks in temporal sequence;
    generating, via the processing unit, the digital information data using the temporally organized text blocks.

    Usability in information retrieval systems

    公开(公告)号:US12032613B2

    公开(公告)日:2024-07-09

    申请号:US17373294

    申请日:2021-07-12

    申请人: BASF SE

    摘要: In order to facilitate a search and identification of documents, an information retrieval system is provided for performing a search on a corpus of data objects. The information retrieval system comprises a device and a database. The database is configured to store at least one syntactic search index data structure and at least one semantic search index data structure. The syntactic search index data structure is configured to index and store in the database a plurality of terms from the corpus of data objects along with syntactic annotations indicating syntactic information. The at least one semantic search index data structure is configured to index and store in the database the plurality of terms from the corpus of data objects along with semantic annotations indicating semantic information. The device comprises an input unit, a processing unit, and an output unit. The input unit is configured to receive a syntactic query and a semantic query. The processing unit is configured to match the syntactic query against the syntactic search index data structure to obtain a first set of data objects, each of which has a set of terms that are syntactically related to the syntactic query. The processing is configured to match the semantic query against The at least one semantic search index data structure to obtain second set of the data objects, each of which has a set of terms that are semantically related to the semantic query, wherein the second set of data objects is a sub-set of the first set of the data objects. The output unit is configured to output information of the second set of data objects.

    Combining data driven models for classifying data

    公开(公告)号:US20220092478A1

    公开(公告)日:2022-03-24

    申请号:US17477971

    申请日:2021-09-17

    申请人: BASF SE

    IPC分类号: G06N20/00

    摘要: The present invention relates to classifying data (24). A first data driven model (50) is trained based on labeled historic data (44). A second data driven model (60), comprises a set of rules (42). Data (24) to be classified is obtained at the first data driven model (50) and the second data driven model (60). A first classification (52) is determined for the data (24) by the first data driven model (50) and a second classification (62) is determined for the data (24) by the second data driven model (60). A result signal (80) is provided based on the classifications (52, 62).

    USABILITY IN INFORMATION RETRIEVAL SYSTEMS

    公开(公告)号:US20220019608A1

    公开(公告)日:2022-01-20

    申请号:US17373294

    申请日:2021-07-12

    申请人: BASF SE

    摘要: In order to facilitate a search and identification of documents, an information retrieval system is provided for performing a search on a corpus of data objects. The information retrieval system comprises a device and a database. The database is configured to store at least one syntactic search index data structure and at least one semantic search index data structure. The syntactic search index data structure is configured to index and store in the database a plurality of terms from the corpus of data objects along with syntactic annotations indicating syntactic information. The at least one semantic search index data structure is configured to index and store in the database the plurality of terms from the corpus of data objects along with semantic annotations indicating semantic information. The device comprises an input unit, a processing unit, and an output unit. The input unit is configured to receive a syntactic query and a semantic query. The processing unit is configured to match the syntactic query against the syntactic search index data structure to obtain a first set of data objects, each of which has a set of terms that are syntactically related to the syntactic query. The processing is configured to match the semantic query against The at least one semantic search index data structure to obtain second set of the data objects, each of which has a set of terms that are semantically related to the semantic query, wherein the second set of data objects is a sub-set of the first set of the data objects. The output unit is configured to output information of the second set of data objects.