Abstract:
Technologies for high-performance single-stream data compression include a computing device that updates an index data structure based on an input data stream. The input data stream is divided into multiple chunks. Each chunk has a predetermined length, such as 136 bytes, and overlaps the previous chunk by a predetermine amount, such as eight bytes. The computing device processes multiple chunks in parallel using the index data to generate multiple token streams. The tokens include literal tokens and reference tokens that refer to matching data from earlier in the input data stream. The computing device thus searches for matching data in parallel. The computing device merges the token streams to generate a single output token stream. The computing device may merge a pair of tokens from two different chunks to generate one or more synchronized tokens that are output to the output token stream. Other embodiments are described and claimed.
Abstract:
A processing system is provided that includes a memory for storing an input bit stream and a processing logic, operatively coupled to the memory, to generate a first score based on: a first set of matching data related to a match between a first bit subsequence and a candidate bit subsequence within the input bit stream, and a first distance of the candidate bit subsequence from the first set of matching data. A second score is generated based on a second set of matching data related to a match between a second bit subsequence and the candidate bit subsequence, and a second distance of the candidate bit subsequence from the second set of matching data. A code to replace the first or second bit subsequence in an output bit stream is identified. Selection of the one of the bit subsequences to replace is based on a comparison of the scores.
Abstract:
In one embodiment, an apparatus comprises a first compression engine to receive a first compressed data block from a second compression engine that is to generate the first compressed data block by compressing a first plurality of repeated instances of data that each have a length greater than or equal to a first length. The first compression engine is further to compress a second plurality of repeated instances of data of the first compressed data block that each have a length greater than or equal to a second length, the second length being shorter than the first length, wherein each compressed repeated instance of the first and second pluralities of repeated instances comprises a location and length of a data instance that is repeated. The apparatus further comprises a memory buffer to store the compressed first and second plurality of repeated instances of data.
Abstract:
A flexible aes instruction for a general purpose processor is provided that performs aes encryption or decryption using n rounds, where n includes the standard aes set of rounds {10, 12, 14}. A parameter is provided to allow the type of aes round to be selected, that is, whether it is a “last round”. In addition to standard aes, the flexible aes instruction allows an AES-like cipher with 20 rounds to be specified or a “one round” pass.
Abstract:
Methods and apparatus to parallelize data decompression are disclosed. An example method adjusting a first one of initial starting positions to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream, the decoding including traversing the bitstream from the training position as though first data located at the training position is a valid token; and merging, by executing an instruction with the processor, first decoded data generated by decoding a first segment of the compressed data bitstream starting from the first adjusted starting position with second decoded data generated by decoding a second segment of the compressed data bitstream, the decoding of the second segment starting from a second position in the compressed data bitstream and being performed in parallel with the decoding of the first segment, and the second segment preceding the first segment in the compressed data bitstream.
Abstract:
A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.
Abstract:
In an embodiment, a processor includes hardware processing cores, a cache memory, and a compression accelerator comprising a hash table memory. The compression accelerator is to: determine a hash value for input data to be compressed; read a first plurality of N location values stored in a hash table entry indexed by the hash value; perform a first set of string searches in parallel from a history buffer using the first plurality of N location values stored in the hash table entry; read a second plurality of N location values stored in a first overflow table entry indexed by a first overflow pointer included in the hash table entry; and perform a second set of string searches in parallel from the history buffer using the second plurality of N location values stored in the first overflow table entry. Other embodiments are described and claimed.
Abstract:
Erasure code syndrome computation based on Reed Solomon (RS) operations in a Galois field to permit reconstruction of data of more than 2 failed storage units. Syndrome computation may be performed with coefficient exponents that consist of −1, 0, and 1. A product xD of a syndrome is computed as a left-shift of data byte D, and selective compensation based on the most significant bit of D. A product x−1D of a syndrome is computed as a right-shift of data byte D, and selective compensation based on the most significant bit of D. Compensation may include bit-wise XORing shift results with a constant derived from an irreducible polynomial associated with the Galois field. A set of erasure code syndromes may be computed for each of multiple nested arrays of independent storage units. Data reconstruction includes solving coefficients of the syndromes as a Vandermonde matrix.
Abstract:
Vector instructions for performing ZUC stream cipher operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first vector instruction to perform an update to a liner feedback shift register (LFSR), and receives a second vector instruction to perform an update to a state of a finite state machine (FSM), where the FSM receives inputs from re-ordered bits of the LFSR. The execution circuitry executes the first vector instruction and the second vector instruction in a single-instruction multiple data (SIMD) pipeline.
Abstract:
According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.