摘要:
Data may be efficiently analyzed and compressed as part of a data compression service. A data compression request may be received from a client indicating data to be compressed. An analysis of the data or metadata associated with the data may be performed. In at least some embodiments, this analysis may be a rules-based analysis. Some embodiments may employ one or more machine learning techniques to historical compression data to update the rules-based analysis. One or more compression techniques may be selected out of a plurality of compression techniques to be applied to the data. Data compression candidates may then be generated according to the selected compression techniques. In some embodiments, a compression service restriction may be enforced. One of the data compression candidates may be selected and sent in a response.
摘要:
A computer-implemented method of performing lossless compression of a digital data set uses an iterative compression process in which the number of symbols N and bit length per symbol n may vary on successive iterations. The process includes analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols.
摘要:
Disclosed herein are representative embodiments for performing entropy coding or decoding using a counter-based scheme. In one exemplary embodiment disclosed herein, a first codeword is received from compressed digital media data. The first codeword is decoded into a first digital media data value by referencing a codeword table that associates the first codeword with the first digital media data value and a second codeword with a second digital media data value. A counter for counting occurrences of the first digital media data value is incremented. The value of the first counter is compared with the value of a second counter that counts occurrences of a second digital media data value. If the value of the first counter and the value of the second counter are equal (or greater than or equal), the codeword table is updated to swap codewords between the first and second digital media values.
摘要:
Disclosed herein are systems and methods for compressing structured or semi-structured data in a horizontal manner achieving compression ratios similar to vertical compression. Collections include structured or semi-structured data include a number of fields and are described using a schema. Fields include information having semantic similarity and are compressed using methods suitable for compressing the type of data. Data of a collection is compressed after fragmentation or may be normalized prior to compression. Data with semantic similarity is compressed using token tables and/or n-gram tables, where higher weighted, consisting of the product of frequency and length, occurring values may be stored in the lower numbered indices of the data table. Records include record descriptor bytes, field descriptor bytes, zero or more array descriptor bytes, zero or more object descriptor bytes, or bytes representing the data associated with the record. Data is indexed or compressed by a suitable module.
摘要:
Exemplary methods, computer systems, and computer program products for processing a previously compressed data stream in a computer environment are provided. In one embodiment, the computer environment is configured for separating a previously compressed data stream into an input data block including a header input block having a previously compressed header. Sequences of bits are included with the input data block. Compression scheme information is derived from the previously compressed header. The input data block is accessed and recompressed following the header input block in the previously compressed data stream one at a time using block-image synchronization information. Access to the block-image synchronization information is initialized by the compression scheme information to generate an output data block. The block-image synchronization information is used to provide decompression information to facilitate decompression of the results of the output data block.
摘要:
A computer-implemented method of performing lossless compression of a digital data set uses an iterative compression process in which the number of symbols N and bit length per symbol n may vary on successive iterations. The process includes analyzing at least a part of the data set to establish a partition thereof into N symbols of symbol length n, and to determine whether the N symbols can be further compressed, and, if so, a model to be used in encoding the N symbols.
摘要:
A data processing apparatus and a data processing method thereof are provided. The data processing apparatus includes a register and a processor electrically connected to the register. The register is stored with a plurality of data. The plurality of data each includes a first sub-datum and a second sub-datum. The plurality of first sub-data corresponds to a first column and the plurality of second sub-data corresponds to a second column. The processor compresses the first sub-data by a first compression algorithm according to a first characteristic of the plurality of first sub-data and compresses the second sub-data by a second compression algorithm according to a second characteristic of the plurality of second sub-data.
摘要:
A data processing apparatus and a data processing method thereof are provided. The data processing apparatus includes a register and a processor electrically connected to the register. The register is stored with a plurality of data. The plurality of data each includes a first sub-datum and a second sub-datum. The plurality of first sub-data corresponds to a first column and the plurality of second sub-data corresponds to a second column. The processor compresses the first sub-data by a first compression algorithm according to a first characteristic of the plurality of first sub-data and compresses the second sub-data by a second compression algorithm according to a second characteristic of the plurality of second sub-data.
摘要:
An apparatus for compressing and decompressing data is disclosed. The apparatus for compressing data includes a block setting unit that divides data of at least one original file into two or more blocks, a compression unit that generates block compression data by applying a compression algorithm to data corresponding to at least one block among blocks divided by the block setting unit, and a compression file generation unit that generates a block header and the block body of the block for each block divided by the block setting unit, in which the block body includes the block compression data if the block is compressed by the compression unit or includes the original data of the block if the block is not compressed the by compression unit.
摘要:
A message within a message queue can be identified. The message queue can be within a software entity of a computing device. The message can be analyzed to determine an encoding scheme to apply to the message. The message can be encoded using the encoding scheme to create an encoded message. The encoding scheme can be a word level encoding scheme, a language-based encoding scheme, or a grammar encoding scheme.