摘要:
An approach for fingerprinting large data objects at the wire speed has been disclosed. The techniques include Fresh/Shift pipelining, split Fresh, optimization, online channel sampling, and pipelined selection. The architecture can also be replicated to work in parallel for higher system throughput. Fingerprinting may provide an efficient mechanism for identifying duplication in a data stream, and deduplication based on the identified fingerprints may provide reduced storage costs, reduced network bandwidth consumption, reduced processing time and other benefits. In some embodiments, fingerprinting may be used to ensure or verify data integrity and may facilitate detection of corruption or tampering. An efficient manner of generating fingerprints (either via hardware, software, or a combination) may reduce a computation load and/or time required to generate fingerprints.
摘要:
An encoding device(100a) generates static-encoded data from input text data, utilizing a static dictionary that associates a character strings and static codes, respectively the static-encoded data including a plurality of static codes corresponding to a plurality of character strings that are registered in the static dictionary, generates dynamic-encoded data from the static-encoded data by encoding a character string or the static code that occurs more than once in the static-encodes data into a dynamic code, creates a dynamic dictionary associating character strings or static codes with corresponding dynamic codes, respectively and creates a Huffman tree and data of a leaf in the Huffman tree based on an occurrence frequency of the dynamic codes and the static codes in the input text data.
摘要:
Compression and decompression of numerical data utilizing single instruction, multiple data (SIMD) instructions is described. The numerical data includes integer and floating-point samples. Compression supports three encoding modes: lossless, fixed-rate, and fixed-quality. SIMD instructions for compression operations may include attenuation, derivative calculations, bit packing to form compressed packets, header generation for the packets, and packed array output operations. SIMD instructions for decompression may include packed array input operations, header recovery, decoder control, bit unpacking, integration, and amplification. Compression and decompression may be implemented in a microprocessor, digital signal processor, field-programmable gate array, application-specific integrated circuit, system-on-chip, or graphics processor, using SIMD instructions. Compression and decompression of numerical data can reduce memory, networking, and storage bottlenecks. This abstract does not limit the scope of the invention as described in the claims.
摘要:
The present disclosure relates to an apparatus for compressing time series data comprising: a difference detection unit 30 configured to determine difference data from reference data and measurement data, a data compression unit 40 configured to compress the difference data determined by the difference detection unit 30, a data export 50 unit configured to provide the compressed difference data to a data storage device.
摘要:
An approach for fingerprinting large data objects at the wire speed has been disclosed. The techniques include Fresh/Shift pipelining, split Fresh, optimization, online channel sampling, and pipelined selection. The architecture can also be replicated to work in parallel for higher system throughput. Fingerprinting may provide an efficient mechanism for identifying duplication in a data stream, and deduplication based on the identified fingerprints may provide reduced storage costs, reduced network bandwidth consumption, reduced processing time and other benefits. In some embodiments, fingerprinting may be used to ensure or verify data integrity and may facilitate detection of corruption or tampering. An efficient manner of generating fingerprints (either via hardware, software, or a combination) may reduce a computation load and/or time required to generate fingerprints.
摘要:
Method for compressing digital data, characterised in that it comprises the steps of: extrapolating (E11) the value of each sample of data to be compressed (EC n ) as a function of the value of at least one preceding sample, in order to produce an extrapolated sample (EE n ), differentiating (E12) between each extrapolated sample and the corresponding sample of data to be compressed, in order to produce a differentiated sample (ED n ), and deleting (E13) redundancy between successive differentiated samples produced by the differentiating stage.
摘要:
In PIPE coding, where alphabet symbols are distributed among a plurality of specialized entropy en/decoders according to their probability distribution estimate, a categorizing stage is provided where source symbols to be encoded are sub-divided into a first substream which is subject to VLC coding, and a second substream which is subject to PIPE coding. By this measure, source symbols having an appropriate symbol probability distribution, i.e. a probability distribution suitable for being efficiently coded by means of VLC coding without the deficiencies outlined above in the introductory portion of the specification of the present application, may be categorized as VLC coded symbols whereas other symbols may be treated as PIPE coded symbols and subject to PIPE coding, the coding complexity of which is higher than VLC coding, but at a better compression efficiency.