Systems and methods for performant data matching

    公开(公告)号:US12130777B2

    公开(公告)日:2024-10-29

    申请号:US18334965

    申请日:2023-06-14

    CPC classification number: G06F16/152 G06F16/137

    Abstract: The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.

    SYSTEMS AND METHODS FOR DATA TRACING
    5.
    发明公开

    公开(公告)号:US20240061810A1

    公开(公告)日:2024-02-22

    申请号:US18498642

    申请日:2023-10-31

    CPC classification number: G06F16/152 G06F21/16 G06F16/137 G06F21/6263

    Abstract: The disclosed embodiments include systems and methods for tracing data. A disclosed tracing system can receive a request to access original data. The request can be associated with access request metadata. The tracing system can generate a tracer based at least in part on the access request metadata and can generate an altered version of the first data by inserting the tracer into the original data. The tracing system can update a database or log file associated with the first data to include association information based on the access request metadata. The tracing system can then provide the altered version of the first data in response to the request.

    DIRECTORY MANAGEMENT METHOD AND SYSTEM FOR FILE SYSTEM BASED ON CUCKOO HASH AND STORAGE MEDIUM

    公开(公告)号:US20240028560A1

    公开(公告)日:2024-01-25

    申请号:US18039967

    申请日:2021-04-25

    CPC classification number: G06F16/137 G06F16/119 G06F16/152

    Abstract: A directory management method and system for a file system based on Cuckoo hash are provided, including the steps of reading metadata of a sub-directory or a sub-file, used as a target file, in a directory: receiving a request for reading the target file in the directory; for the target file, determining an ith candidate data block, in a hash table of the directory according to a hash calculation result obtained by performing hash calculation on a filename filename of the target file according to an ith hash function of Cuckoo hash iteratively; if the filename filename of the target file exists in the ith candidate data block, reading metadata of the target file and returning the metadata, and ending the process; otherwise, continuing to iterate until the iteration ends and returning a message that the target file does not exist.

    FILE DE-DUPLICATION FOR A DISTRIBUTED DATABASE

    公开(公告)号:US20240012792A1

    公开(公告)日:2024-01-11

    申请号:US18465948

    申请日:2023-09-12

    CPC classification number: G06F16/1748 G06F16/162 G06F16/152

    Abstract: A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.

    Systems and methods for data tracing

    公开(公告)号:US11836111B2

    公开(公告)日:2023-12-05

    申请号:US18060891

    申请日:2022-12-01

    CPC classification number: G06F16/152 G06F16/137 G06F21/16 G06F21/6263

    Abstract: The disclosed embodiments include systems and methods for tracing data. A disclosed tracing system can receive a request to access original data. The request can be associated with access request metadata. The tracing system can generate a tracer based at least in part on the access request metadata and can generate an altered version of the first data by inserting the tracer into the original data. The tracing system can update a database or log file associated with the first data to include association information based on the access request metadata. The tracing system can then provide the altered version of the first data in response to the request.

Patent Agency Ranking