System and method for codebook management based on data source grouping

    公开(公告)号:US12147667B2

    公开(公告)日:2024-11-19

    申请号:US18593931

    申请日:2024-03-03

    Abstract: A system and method for codebook management is disclosed. Training datasets are obtained from various data sources. A similarity score is generated for each training dataset with reference to the other training datasets. In response to detecting a similarity score above a predetermined threshold for one or more of the other training datasets, a combined codebook is created based on training datasets that have a similarity score above a predetermined threshold. Based on the similarity score, multiple data sources are combined into a group, and the combined codebook is used for the data sources within the group. A mismatch performance metric can be computed for the combined codebook, and a revised combined codebook can be regenerated in response to the mismatch performance metric being above a predetermined threshold.

    SYSTEM AND METHOD FOR CODEBOOK MANAGEMENT BASED ON DATA SOURCE GROUPING

    公开(公告)号:US20240377949A1

    公开(公告)日:2024-11-14

    申请号:US18773999

    申请日:2024-07-16

    Abstract: A system and method for codebook management is disclosed. Training datasets are obtained from various data sources. A similarity score is generated for each training dataset with reference to the other training datasets. In response to detecting a similarity score above a predetermined threshold for one or more of the other training datasets, a combined codebook is created based on training datasets that have a similarity score above a predetermined threshold. Based on the similarity score, multiple data sources are combined into a group, and the combined codebook is used for the data sources within the group. A mismatch performance metric can be computed for the combined codebook, and a revised combined codebook can be regenerated in response to the mismatch performance metric being above a predetermined threshold.

    SYSTEM AND METHOD FOR DATA COMPACTION AND ENCRYPTION OF ANONYMIZED DATA RECORDS

    公开(公告)号:US20240329837A1

    公开(公告)日:2024-10-03

    申请号:US18737962

    申请日:2024-06-08

    Abstract: A system and method for data compaction and encryption of anonymized data records. A dataset may be pre-processed by dividing into a plurality of sourceblocks at all reasonable sourceblock lengths, and then counting how many times each sourceblock occurs in the dataset, resulting in a tally record of tokens and their count value. This tally record may then be anonymized and transmitted to a data deconstruction engine which combined with a library manager creates a codebook and performs optimization techniques on the codebook. The received anonymized tally record may be parsed into individual tokens by identifying the tokens with the highest count value. The tokens may then be sent, in descending order of count value, to the library manger where each token may be assigned a codeword. A half-backed codebook is then created using the tokens and each token's unique codeword, before sending the half-backed codebook to a system user.

    SYSTEM AND METHOD FOR MANIPULATION OF COMPACTED DATA FILES

    公开(公告)号:US20240086372A1

    公开(公告)日:2024-03-14

    申请号:US18516924

    申请日:2023-11-21

    CPC classification number: G06F16/1752 G06F3/0608 G06F3/0641 G06F3/067

    Abstract: A system and method for manipulation of compacted data files, utilizing a reference codebook, a random-access engine, a data deconstruction engine, and a data deconstruction engine. The system may receive a data query pertaining to a data read or data write request, wherein the data file to be read from or written to is a compacted data file. A random-access engine may facilitate data manipulation processes by accessing a reference codebook associated with the compacted data file, a frequency table used to construct the reference codebook, and data query details. A data read request is supported by random-access search capabilities that may enable the locating and decoding of the bits corresponding to data query details. A random-access engine facilitates data write processes. The random-access engine may encode the data to be written, insert the encoded data into a compacted data file, and update the codebook as needed.

Patent Agency Ranking