Feature vector generation for probabalistic matching

    公开(公告)号:US12039273B2

    公开(公告)日:2024-07-16

    申请号:US16942925

    申请日:2020-07-30

    摘要: A computer-implemented method increases the efficiency of matching records from two sources. The method includes identifying a first source and a second source wherein each of the sources include one or more records and each record includes one or more attributes. The method further includes determining, based on a corpus, the one or more attributes and generating, based on the attributes, a set of feature vectors which vectors represent the one or more attributes. The method includes comparing each record in the first source against each record in the second source. The method further includes generating, in response to the comparing, a link confidence. The method also includes linking, in response to the link confidence being above a linking threshold, the associated records. The method includes determining a first feature vector of the set of feature vectors used in the linking, and outputting a set of results.

    Method for dynamic data blocking in a database system

    公开(公告)号:US11663275B2

    公开(公告)日:2023-05-30

    申请号:US16841255

    申请日:2020-04-06

    IPC分类号: G06F16/906 G06F16/22

    CPC分类号: G06F16/906 G06F16/22

    摘要: A method is disclosed for a database system that includes a set of data blocks comprising records having attributes. The set of data blocks are instances of at least one block type. The block type is defined by a subset of one or more attributes of the attributes. An instance of the block type comprises records having one distinct group of values of the subset of attributes. The method includes detecting that a subset of one or more data blocks of the block type, of the set of data blocks, reached a first maximum number of records. The method includes determining an additional attribute of the attributes to define a new block type by the combination of the additional attribute and the subset of attributes that define the block type. The method includes creating one or more data blocks which are instances of the new block type.

    MAPPING OF HETEROGENEOUS DATA AS MATCHING FIELDS

    公开(公告)号:US20230029643A1

    公开(公告)日:2023-02-02

    申请号:US17443341

    申请日:2021-07-26

    IPC分类号: G06F16/22

    摘要: A method, a structure, and a computer system for mapping data fields. The exemplary embodiments may include, based on determining that a first data set and a second data set contain homogenous data, mapping at least one column of the first data set to at least one column of the second data set based on comparing at least one of relative column position and unique value sets. Based on determining that the first data set and the second data set contain heterogeneous data, the exemplary embodiments may include mapping the at least one column of the first data set to the at least one column of the second data set based on a difference between distribution signatures of unique value sets within each of the first data set and the second data set being less than a threshold.

    METHOD FOR ACCESSING DATA RECORDS OF A MASTER DATA MANAGEMENT SYSTEM

    公开(公告)号:US20200320153A1

    公开(公告)日:2020-10-08

    申请号:US16801241

    申请日:2020-02-26

    摘要: An approach for accessing multi-attribute data records of a master data management system. The method comprises: enhancing the master data management system with one or more search engines for enabling data record access. A request of data may be received at the master data management system. A set of one or more of the multiple attributes, referenced in the received request, may be identified. A combination of one or more of the search engines of the master data management system, whose performances for searching values of at least part of the set of attributes fulfil a current selection rule may be selected. And, the request may be processed using the combination of search engines. At least part of the results of the processing may be provided, and the selection rule may be updated based on user operations on the provided results, the updated selection rule becoming the current selection rule.

    Auto detection of matching fields in entity resolution systems

    公开(公告)号:US11726980B2

    公开(公告)日:2023-08-15

    申请号:US16928361

    申请日:2020-07-14

    IPC分类号: G06F16/23 G06F16/28

    CPC分类号: G06F16/2365 G06F16/288

    摘要: Methods, computer program products and/or systems are provided that perform the following operations: obtaining payload attribute fields; determining potential matching fields from the payload attribute fields; determining a matching function for each of the potential matching fields; determining an attribute score for each of the potential matching fields based on the matching function; obtaining a score list for a reference data set; determining a correlation of the attribute score for each of the potential matching fields with the reference data set score list; selecting new matching fields from the potential matching fields based at least in part on the correlation; determining an optimal weight for each of the selected new matching fields; selecting attribute fields for matching from the selected new matching fields based on a threshold rate for false positives and false negatives; and providing the attribute fields for matching and the associated optimal weight for the attribute fields.