System and methods for secure storage for data deduplication

    公开(公告)号:US11741051B2

    公开(公告)日:2023-08-29

    申请号:US17578476

    申请日:2022-01-19

    CPC classification number: G06F16/1752 G06F3/067 G06F3/0608 G06F3/0641

    Abstract: A system and methods for secure storage for data deduplication comprising a data deconstruction engine, a data reconstruction engine, a library manager, a reference codebook, and a codeword storage which performs simultaneous compaction and deduplication of data sets. A data set may be comprised of one or more sourcepackets which may be optimally deconstructed into a plurality of sourceblocks and wherein each sourceblock may be compared against a reference codebook that contains key-value pairs of a sourceblock and its associated reference code in order to determine if a received sourceblock is a duplicate of data already stored within the reference codebook. Non-duplicate sourceblocks can have a reference code algorithmically created and stored in the reference codebook, thereby ensuring that when a duplicate sourceblock is received, it will not be stored as duplicated data.

    SYSTEM AND METHOD FOR RANDOM-ACCESS MANIPULATION OF COMPACTED DATA FILES

    公开(公告)号:US20230177014A1

    公开(公告)日:2023-06-08

    申请号:US18078909

    申请日:2022-12-09

    CPC classification number: G06F16/1752 G06F3/0608 G06F3/0641 G06F3/067

    Abstract: A system and method for random-access manipulation of compacted data files, utilizing a reference codebook, a random-access engine, a data deconstruction engine, and a data deconstruction engine. The system may receive a data query pertaining to a data read or data write request, wherein the data file to be read from or written to is a compacted data file. A random-access engine may facilitate data manipulation processes by accessing a reference codebook associated with the compacted data file, a frequency table used to construct the reference codebook, and data query details. A data read request is supported by random-access search capabilities that may enable the locating and decoding of the bits corresponding to data query details. A random-access engine facilitates data write processes. The random-access engine may encode the data to be written, insert the encoded data into a compacted data file, and update the codebook as needed.

    SYSTEM AND METHOD FOR DATA COMPACTION UTILIZING MISMATCH PROBABILITY ESTIMATION

    公开(公告)号:US20230043546A1

    公开(公告)日:2023-02-09

    申请号:US17974230

    申请日:2022-10-26

    Abstract: A system and method for compacting data that uses mismatch probability estimation to improve entropy encoding methods to account for, and efficiently handle, previously-unseen data in data to be compacted. Training data sets are analyzed to determine the frequency of occurrence of each sourceblock in the training data sets. A mismatch probability estimate is calculated comprising an estimated frequency at which any given data sourceblock received during encoding will not have a codeword in the codebook. Entropy encoding is used to generate codebooks comprising codewords for data sourceblocks based on the frequency of occurrence of each sourceblock. A “mismatch codeword” is inserted into the codebook based on the mismatch probability estimate to represent those cases when a block of data to be encoded does not have a codeword in the codebook. During encoding, if a mismatch occurs, a secondary encoding process is used to encode the mismatched sourceblock.

    DATA COMPRESSION WITH INTRUSION DETECTION
    20.
    发明公开

    公开(公告)号:US20240364359A1

    公开(公告)日:2024-10-31

    申请号:US18651671

    申请日:2024-04-30

    CPC classification number: H03M7/3059 G06F21/554 G06N20/00 H03M7/6005

    Abstract: Data compression with intrusion detection, that measures in real-time the probability distribution of an encoded data stream, compares the probability distribution to a reference probability distribution, and uses one or more statistical algorithms to determine the divergence between the two sets of probability distributions to determine if an unusual distribution is the result of a data intrusion. The system comprises both encoding and decoding machines, an intrusion detection module, a codebook training module, and various databases which perform various analyses on encoded data streams.

Patent Agency Ranking