Systems and methods of joining data records and detecting string similarity

    公开(公告)号:US12164539B2

    公开(公告)日:2024-12-10

    申请号:US18071294

    申请日:2022-11-29

    Abstract: The disclosure relates to methods and systems of joining data structures based on a composite similarity score (CSS). For example, a computer system may use a plurality of similarity models to generate respective similarity scores. Each similarity score may be a metric that indicates a confidence that a first data value of a first data record is similar to a second data value of a second data record. The computer system may generate the CSS based on the plurality of similarity sub-scores. The CSS may indicate a confidence that the records being compared are similar. Thus, the CSS may be used to detect similar data records across different data structures. The disclosure also relates to a string similarity model that detects similarity among strings without respect to an order of words in each string and in a way that tolerates errors or omissions in one or both strings.

Patent Agency Ranking