Abstract:
Systems and methods for performing genomic information compression, transmission, and decompression are provided. A system for compression, transmission, and decompression of genomic information includes a first computer associated with a first index and a second computer associated with a second index, each index containing reference permutations of nucleic acid sequence portions, each permutation associated with a reference number. The first computer uses input genomic information and the first index to produce a compressed representation of the genomic information, and transmits the compressed representation to the second computer. The second computer uses the compressed representation and the second index to assemble a data representation of the genomic information. The compressed representation comprises references to permutations, indications of locations of each permutation in the input information, indications of variations to permutations, and/or indications of sequence length.
Abstract:
Systems, devices and methods for data compression using history search for dictionary based compression. Systems, devices and methods may use parallel processing techniques for data compression and encoding. Systems, devices and methods may provide memory search techniques for hardware.
Abstract:
Exemplary method, system, and computer program product embodiments for efficient one-pass cache-aware compression are provided. In one embodiment, by way of example only, an output of a fast compressor to Huffman encoding for achieving the one-pass cache-aware compression by using a predetermined Huffman-tree upon determining by the fast compressor a final representation of each data byte.
Abstract:
A data encoding method for encoding successive input data values comprises the steps of: selecting one of a plurality of complementary sub-ranges of a set of code values according to the value of a current input data value, the proportions of the sub-ranges relative to the set of code values being defined by a context variable associated with that input data value; assigning the current input data value to a code value within the selected sub-range; modifying the set of code values in dependence upon the assigned code value and the size of the selected sub- range; detecting whether the set of code values is less than a predetermined minimum size and if so, successively increasing the size of the set of code values until it has at least the predetermined minimum size; and outputting an encoded data bit in response to each such size-increasing operation; modifying the context variable, for use in respect of a next input data value, so as to increase the proportion of the set of code values in the sub-range whichwas selected for the current data value; and after encoding a group of input data values, terminating the output data by: setting a value definingand end of the set of code values to a value having a plurality of least significant bits equal to zero; increasing the size of the set of code values; and writing the value defining the end of the setof code values to the output data.
Abstract:
In column domain dictionary compression, column values in one or more columns are tokenized by a single dictionary. The domain of the dictionary is the entire set of columns. A dictionary may not only map a token to a tokenized value, but also to a count ("token count") of the number of occurrences of the token and corresponding tokenized value in the dictionary's domain. Such information may be used to compute queries on the base table.
Abstract:
An encoder for encoding a sequence of symbols is described which comprises an assigner configured to assign a number of parameters to each symbol of the sequence of symbols based on information contained within previous symbols of the sequence of symbols; a plurality of entropy encoders each of which is configured to convert the symbols forwarded to the respective entropy encoder into a respective bitstream; and a selector configured to forward each symbol to a selected one of the plurality of entropy encoders, the selection depending on the number of parameters assigned to the respective symbol.
Abstract:
MIFARE applications (MIA) are organized in at least one sector comprising sector data being arranged in data blocks and a sector trailer. A compressing method for MIFARE application comprises: searching for consecutive occurrences of same data values in the sector data and replacing the detected consecutive data having the same data value by a sequence comprising said data value and a number indicating the number of consecutive sector data having that data value; and/or searching for all different sector trailer values and replacing all sector trailers by references to respective ones of the different sector trailer values.
Abstract:
An apparatus for processing a signal and method thereof are disclosed. Data coding and entropy coding are performed with interconnection, and grouping is used to enhance coding efficiency. The present invention includes the steps of hierarchically extracting identification information indicating at least three or more data coding schemes. The identification information indicating two coding schemes having high frequencies of use for the identification information are extracted from different layers .