Abstract:
Methods and apparatus related to efficient Solid State Drive (SSD) data compression scheme and layout are described. In one embodiment, logic, coupled to non-volatile memory, receives data (e.g., from a host) and compresses the data to generate compressed data prior to storage of the compressed data in the non-volatile memory. The compressed data includes a compressed version of the data, size of the compressed data, common meta information, and final meta information. Other embodiments are also disclosed and claimed.
Abstract:
One embodiment provides an apparatus. The apparatus includes a single instruction multiple data (SIMD) hash module configured to apportion at least a first portion of a message of length L to a number (S) of segments, the message including a plurality of sequences of data elements, each sequence including S data elements, a respective data element in each sequence apportioned to a respective segment, each segment including a number N of blocks of data elements and to hash the S segments in parallel, resulting in S segment digests, the S hash digests based, at least in part, on an initial value and to store the S hash digests; a padding module configured to pad a remainder, the remainder corresponding to a second portion of the message, the second portion related to the length L of the message, the number of segments and a block size; and a non-SIMD hash module configured to hash the padded remainder, resulting in an additional hash digest and to store the additional hash digest.
Abstract:
Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
Abstract:
Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
Abstract:
Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
Abstract:
A processor is described having an instruction execution pipeline having a functional unit to execute an instruction that compares vector elements against an input value. Each of the vector elements and the input value have a first respective section identifying a location within data and a second respective section having a byte sequence of the data. The functional unit has comparison circuitry to compare respective byte sequences of the input vector elements against the input value's byte sequence to identify a number of matching bytes for each comparison. The functional unit also has difference circuitry to determine respective distances between the input vector's elements' byte sequences and the input value's byte sequence within the data.
Abstract:
Methods and apparatus to parallelize data decompression are disclosed. A method selects the initial starting positions in a compressed data bitstream. A first one of the initial starting positions is adjusted to determine a first adjusted starting position by decoding the bitstream starting at a training position in the bitstream. The decoding includes traversing the bitstream from the training position as though first data located at the training position is a valid token. The first decoded data generated by decoding a first segment of the bitstream starting from the first adjusted starting position is output. The first decoded data is merged with second decoded data generated by decoding a second segment of the bitstream. The decoding of the second segment starting from a second position in the bitstream is performed in parallel with the decoding of the first segment. The second segment precedes the first segment in the bitstream.
Abstract:
A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.
Abstract:
A processor of an aspect includes a plurality of packed data registers and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SMS4 rounds. The one or more source operands are also to have a 32-bit value. An execution unit is coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store a 32-bit result of a current SMS4 round in a destination storage location that is to be indicated by the instruction.
Abstract:
A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.