Adaptive recognition of entities
    1.
    发明授权

    公开(公告)号:US11755680B2

    公开(公告)日:2023-09-12

    申请号:US16749234

    申请日:2020-01-22

    Abstract: A system receives a record which includes a string and separates the string into a number of tokens, including a token and another token. The system identifies a pattern that includes an entity, another entity, and a number of entities that equals the number of tokens, and another pattern that includes the same number of entities as the number of tokens. The system determines a combined probability that combines a probability based on the number of entries in the entity's dictionary which stores the token, and another probability based on a number of character types in the other entity that match characters in the other token. If the combined probability associated with the pattern is greater than another combined probability associated with the other pattern, the system matches the record to a system record based on recognizing the token as the entity and the other token as the other entity.

    Trie-based normalization of field values for matching

    公开(公告)号:US11016959B2

    公开(公告)日:2021-05-25

    申请号:US15884732

    申请日:2018-01-31

    Abstract: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.

    Adaptive match indexes
    3.
    发明授权

    公开(公告)号:US11372928B2

    公开(公告)日:2022-06-28

    申请号:US16775611

    申请日:2020-01-29

    Abstract: Determine first count of first records storing first value in first field, second count of second records storing second value in second field, third count of third records storing third value in third field. Determine count threshold using first, second and third counts, dispersion measure based on dispersion of values stored in second field by first records and other dispersion measure based on other dispersion of values stored in third field by first records. Train machine-learning model to determine dispersion measure threshold based on dispersion and other dispersion measures. If first count is greater than count threshold, and dispersion measure is greater than dispersion measure threshold, create match index based on first and second fields. Receive prospective record storing first value in first field, second value in second field. Use match index to identify record storing first value in first field, second value in second field as matching prospective record.

    ADAPTIVE FIELD-LEVEL MATCHING
    4.
    发明申请

    公开(公告)号:US20210342353A1

    公开(公告)日:2021-11-04

    申请号:US16862667

    申请日:2020-04-30

    Abstract: Adaptive field-level matching is described. A system identifies first elements in a field of a prospective record for a database, and second elements in the field of a candidate record, in the database, for matching the prospective record. The system identifies features corresponding to any of the first elements that are identical to any of the second elements, any of the first elements that are absent from the second elements, and any of the second elements that are absent from the first elements. A machine-learning model uses the features to determine a field match score for the candidate record's field. Another machine-learning model weighs the field match score and weighs another field match score for another field of the candidate record to determine a record match score for the candidate record. If the record match score satisfies a threshold, the system identifies the candidate record as matching the prospective record.

    Adaptive field-level matching
    5.
    发明授权

    公开(公告)号:US11755582B2

    公开(公告)日:2023-09-12

    申请号:US16862667

    申请日:2020-04-30

    CPC classification number: G06F16/24558 G06F16/24564 G06N20/00

    Abstract: Adaptive field-level matching is described. A system identifies first elements in a field of a prospective record for a database, and second elements in the field of a candidate record, in the database, for matching the prospective record. The system identifies features corresponding to any of the first elements that are identical to any of the second elements, any of the first elements that are absent from the second elements, and any of the second elements that are absent from the first elements. A machine-learning model uses the features to determine a field match score for the candidate record's field. Another machine-learning model weighs the field match score and weighs another field match score for another field of the candidate record to determine a record match score for the candidate record. If the record match score satisfies a threshold, the system identifies the candidate record as matching the prospective record.

    ADAPTIVE MATCH INDEXES
    7.
    发明申请

    公开(公告)号:US20210232637A1

    公开(公告)日:2021-07-29

    申请号:US16775611

    申请日:2020-01-29

    Abstract: Determine first count of first records storing first value in first field, second count of second records storing second value in second field, third count of third records storing third value in third field. Determine count threshold using first, second and third counts, dispersion measure based on dispersion of values stored in second field by first records and other dispersion measure based on other dispersion of values stored in third field by first records. Train machine-learning model to determine dispersion measure threshold based on dispersion and other dispersion measures. If first count is greater than count threshold, and dispersion measure is greater than dispersion measure threshold, create match index based on first and second fields. Receive prospective record storing first value in first field, second value in second field. Use match index to identify record storing first value in first field, second value in second field as matching prospective record.

    ADAPTIVE RECOGNITION OF ENTITIES
    8.
    发明申请

    公开(公告)号:US20210224482A1

    公开(公告)日:2021-07-22

    申请号:US16749234

    申请日:2020-01-22

    Abstract: A system receives a record which includes a string and separates the string into a number of tokens, including a token and another token. The system identifies a pattern that includes an entity, another entity, and a number of entities that equals the number of tokens, and another pattern that includes the same number of entities as the number of tokens. The system determines a combined probability that combines a probability based on the number of entries in the entity's dictionary which stores the token, and another probability based on a number of character types in the other entity that match characters in the other token. If the combined probability associated with the pattern is greater than another combined probability associated with the other pattern, the system matches the record to a system record based on recognizing the token as the entity and the other token as the other entity.

    TRIE-BASED NORMALIZATION OF FIELD VALUES FOR MATCHING

    公开(公告)号:US20190236178A1

    公开(公告)日:2019-08-01

    申请号:US15884732

    申请日:2018-01-31

    CPC classification number: G06F16/2365 G06F16/24575 G06F16/2468

    Abstract: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.

Patent Agency Ranking