SYSTEM AND METHOD FOR DATA COMPACTION UTILIZING MISMATCH PROBABILITY ESTIMATION

    公开(公告)号:US20230384933A1

    公开(公告)日:2023-11-30

    申请号:US18295238

    申请日:2023-04-03

    Abstract: A system and method for compacting data that uses mismatch probability estimation to improve entropy encoding methods to account for, and efficiently handle, previously-unseen data in data to be compacted. Training data sets are analyzed to determine the frequency of occurrence of each sourceblock in the training data sets. A mismatch probability estimate is calculated comprising an estimated frequency at which any given data sourceblock received during encoding will not have a codeword in the codebook. Entropy encoding is used to generate codebooks comprising codewords for data sourceblocks based on the frequency of occurrence of each sourceblock. A “mismatch codeword” is inserted into the codebook based on the mismatch probability estimate to represent those cases when a block of data to be encoded does not have a codeword in the codebook. During encoding, if a mismatch occurs, a secondary encoding process is used to encode the mismatched sourceblock.

    System and method for error-resilient data reduction

    公开(公告)号:US11550756B2

    公开(公告)日:2023-01-10

    申请号:US17233813

    申请日:2021-04-19

    Abstract: A system and method for error-resilient data reduction, utilizing a phase detector, a data requestor, a multi-phase trainer, a reconstruction engine, a deconstruction engine, and one or more reference codebooks. A multi-phase trainer may be used to train the reconstruction and deconstruction engines on various phase sourceblocks in order recover quickly from corrupted data files that cause the phase alignment of the sourceblocks to become out of phase. A phase detector may determine when the sourceblocks get out of phase and when the return to in-phase by checking if a predetermined threshold probability of correct encoding is met. Data requestor may request for retransmission only the data that was received out of phase.

    SYSTEM AND METHOD FOR COMPUTER DATA TYPE IDENTIFICATION

    公开(公告)号:US20220327098A1

    公开(公告)日:2022-10-13

    申请号:US17727919

    申请日:2022-04-25

    Abstract: A system and method for file type identification involving extraction of a file-print of a file, the file-print being a unique or practically-unique representation of statistical characteristics associated with the distribution of bits in the binary contents of the file, similar to a fingerprint. The file-print is then passed to a machine learning algorithm that has been trained to recognize file types from their file-prints. The machine learning algorithm returns a predicted file type and, in some cases, a probability of correctness of the prediction. The file may then be encoded using an encoding algorithm chosen based on the predicted file type.

Patent Agency Ranking