Causal Knowledge Identification and Extraction

    公开(公告)号:US20220405487A1

    公开(公告)日:2022-12-22

    申请号:US17354171

    申请日:2021-06-22

    IPC分类号: G06F40/40 G06N20/00 G06N5/04

    摘要: A computer-implemented method is provided that includes accessing candidate text and a candidate pair including first and second phrases, substituting the first and second phrases into cause-effect patterns to generate variant sentences. An artificial intelligence model is leveraged to determine respective probabilities that the variant sentences are inferred from the candidate text, calculate a statistical measure of the respective probabilities, and assess the calculated statistical measure to ascertain whether the first and second phrases possess a causal relationship or non-causal relationship to one another. A knowledge base including one or more pairs of cause-effect phrase pairs is populated with the first and second phrases possessing the causal relationship. A computer system and a computer program product are also provided.

    Knowledge graph augmentation through schema extension

    公开(公告)号:US11204960B2

    公开(公告)日:2021-12-21

    申请号:US16399535

    申请日:2019-04-30

    IPC分类号: G06F16/901 G06F16/36

    摘要: A method, system, and recording medium for knowledge graph augmentation using data based on a statistical analysis of attributes in the data, including a ranking device configured to rank semantically similar input data elements to create a ranked list of attributes to augment an input of structured data and populate with a data string corresponding to the instances, where the ranking device further combines a set of filters to refine the ranked list of attributes, the set of filters including a first filter according to column ranges of columns, a second filter according to a column uniqueness of the columns, a third filter according to a type of data in a column of the columns, and a fourth filter according to a distribution of values in the columns.

    Knowledge Aided Feature Engineering

    公开(公告)号:US20210216904A1

    公开(公告)日:2021-07-15

    申请号:US16741084

    申请日:2020-01-13

    IPC分类号: G06N20/00 G06F11/34

    摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new feature is semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

    Methods and systems for discovery of linkage points between data sources

    公开(公告)号:US10599732B2

    公开(公告)日:2020-03-24

    申请号:US15440372

    申请日:2017-02-23

    摘要: Data records are linked across a plurality of datasets. Each dataset contains at least one data record, and each data record is associated with an entity and includes one or more attributes of that entity and a value for each attribute. Values associated with attributes are compared across datasets, and matching attributes having values that satisfy a predetermined similarity threshold are identified. In addition, linkage points between pairs of datasets are identified. Each linkage point links one or more pairs of data records. Each data record in the pair of data records is contained in one of a given pair of datasets, and each pair of data records is associated with a common entity having matching attributes in the given pair of datasets. Data records associated with the common entities are linked across datasets using the identified linkage points.