MAPPING OF HETEROGENEOUS DATA AS MATCHING FIELDS

    公开(公告)号:US20230029643A1

    公开(公告)日:2023-02-02

    申请号:US17443341

    申请日:2021-07-26

    IPC分类号: G06F16/22

    摘要: A method, a structure, and a computer system for mapping data fields. The exemplary embodiments may include, based on determining that a first data set and a second data set contain homogenous data, mapping at least one column of the first data set to at least one column of the second data set based on comparing at least one of relative column position and unique value sets. Based on determining that the first data set and the second data set contain heterogeneous data, the exemplary embodiments may include mapping the at least one column of the first data set to the at least one column of the second data set based on a difference between distribution signatures of unique value sets within each of the first data set and the second data set being less than a threshold.

    AUTO-TUNING OF COMPARISON FUNCTIONS

    公开(公告)号:US20210110295A1

    公开(公告)日:2021-04-15

    申请号:US16599427

    申请日:2019-10-11

    IPC分类号: G06N20/00 G06N5/04 H04L9/30

    摘要: A method for relating different types of records. The method may include providing comparison functions, wherein each comparison function corresponds to a semantical class, and wherein a computational cost is associated with each comparison function. The method may include determining one or more attribute pairs between the different types of records. The method may include sorting the comparison functions according to a determined accuracy. The method may include selecting a set of comparison functions associated with semantical classes according to a predefined rule. The method may include determining a total computational cost based on the computational cost of the selected set of comparison functions. The method may include determining whether two or more records are related using the selected set of comparison functions. The method may include relating the two or more records. The method may include determining a rate of false negative records.

    METADATA INDEXING FOR INFORMATION MANAGEMENT

    公开(公告)号:US20220164396A1

    公开(公告)日:2022-05-26

    申请号:US17105425

    申请日:2020-11-25

    摘要: A method, apparatus, computer system, and computer program product for managing information. A set of bucket hashes and comparison information for a data record are identified by a computer system. The set of bucket hashes is generated from the comparison information, wherein the set of bucket hashes and the comparison information form a metadata record. A number of candidate metadata records in a metadata database is identified by the computer system using the set of bucket hashes, wherein the number of candidate metadata records comprises a set of candidate bucket hashes and candidate comparison information. An entity membership is identified by the computer system for the data record from a comparison of the comparison information in the metadata record with the candidate comparison information in the number of candidate metadata records.