Abstract:
A method of an aspect includes receiving an instruction. The instruction indicates a first source of a first packed data including state data elements ai, bi, ei, and fi for a current round (i) of a secure hash algorithm 2 (SHA2) hash algorithm. The instruction indicates a second source of a second packed data. The first packed data has a width in bits that is less than a combined width in bits of eight state data elements ai, bi, ci, di, ei, fi, gi, hi of the SHA2 hash algorithm. The method also includes storing a result in a destination indicated by the instruction in response to the instruction. The result includes updated state data elements ai+, bi+, ei+, and fi+ that have been updated from the corresponding state data elements ai, bi, ei, and fi by at least one round of the SHA2 hash algorithm.
Abstract:
A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
Abstract:
Methods and apparatuses relating to memory compression and decompression are described, including a memory controller and methods for memory compression utilizing a hardware compression engine and a dictionary to indicate a zero value, full match, partial match, or no match. When indices for multiple sections are the same, an entry in the dictionary may be updated with the value of the section that is most recent, in the same order as in the block of data. In one embodiment, a hardware compression engine is to determine when each section of a plurality of sections of a block of data is a zero value, a full match or a partial match to an entry in a dictionary, or a no match to any entry in the dictionary, encode a tag for each section to indicate the one of the zero value, the full match, the partial match, and the no match, encode a literal when the section is the no match, an index to the entry in the dictionary when the section is the full match, and an index to the entry in the dictionary and non-matching bits when the section is the partial match, and update an entry in the dictionary with a value of a section when the section is the no match, wherein tags for the plurality of sections are to be output from the hardware compression engine in a single field, literals for the plurality of sections are to be output from the hardware compression engine in a single field, indexes for the plurality of sections are to be output from the hardware compression engine in a single field, and non-matching bits for the plurality of sections are to be output from the hardware compression engine in a single field. A hash value may be generated for each of a plurality of sections of a block of data to use as an index in a dictionary.
Abstract:
Technologies for executing a serial data processing algorithm on a single variable length data buffer includes streaming segments of the buffer into a data register, executing the algorithm on each of the segments in parallel, and combining the results of executing the algorithm on each of the segments to form the output of the serial data processing algorithm.
Abstract:
Technologies for high-performance single-stream data compression include a computing device that updates an index data structure based on an input data stream. The input data stream is divided into multiple chunks. Each chunk has a predetermined length, such as 136 bytes, and overlaps the previous chunk by a predetermine amount, such as eight bytes. The computing device processes multiple chunks in parallel using the index data to generate multiple token streams. The tokens include literal tokens and reference tokens that refer to matching data from earlier in the input data stream. The computing device thus searches for matching data in parallel. The computing device merges the token streams to generate a single output token stream. The computing device may merge a pair of tokens from two different chunks to generate one or more synchronized tokens that are output to the output token stream. Other embodiments are described and claimed.
Abstract:
Technologies for performing speculative decompression include a managed node to decode a variable size code at a present position in compressed data with a deterministic decoder and concurrently perform speculative decodes over a range of subsequent positions in the compressed data, determine the position of the next code, determine whether the position of the next code is within the range, and output, in response to a determination that the position of the next code is within the range, a symbol associated with the deterministically decoded code and another symbol associated with a speculatively decoded code at the position of the next code.
Abstract:
Technologies for performing speculative decompression include a managed node to decode a variable size code at a present position in compressed data with a deterministic decoder and concurrently perform speculative decodes over a range of subsequent positions in the compressed data, determine the position of the next code, determine whether the position of the next code is within the range, and output, in response to a determination that the position of the next code is within the range, a symbol associated with the deterministically decoded code and another symbol associated with a speculatively decoded code at the position of the next code.
Abstract:
A processor of an aspect includes a plurality of packed data registers and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SMS4 rounds. The one or more source operands are also to have a 32-bit value. An execution unit is coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store a 32-bit result of a current SMS4 round in a destination storage location that is to be indicated by the instruction.
Abstract:
Instructions and logic provide for a Single Instruction Multiple Data (SIMD) SM4 round slice operation. Embodiments of an instruction specify a first and a second source data operand set, and substitution function indicators, e.g. in an immediate operand. Embodiments of a processor may include encryption units, responsive to the first instruction, to: perform a slice of SM4-round exchanges on a portion of the first source data operand set with a corresponding keys from the second source data operand set in response to a substitution function indicator that indicates a first substitution function, perform a slice of SM4 key generations using another portion of the first source data operand set with corresponding constants from the second source data operand set in response to a substitution function indicator that indicates a second substitution function, and store a set of result elements of the first instruction in a SIMD destination register.
Abstract:
Instructions and logic provide for a Single Instruction Multiple Data (SIMD) SM4 round slice operation. Embodiments of an instruction specify a first and a second source data operand set, and substitution function indicators, e.g. in an immediate operand. Embodiments of a processor may include encryption units, responsive to the first instruction, to: perform a slice of SM4-round exchanges on a portion of the first source data operand set with a corresponding keys from the second source data operand set in response to a substitution function indicator that indicates a first substitution function, perform a slice of SM4 key generations using another portion of the first source data operand set with corresponding constants from the second source data operand set in response to a substitution function indicator that indicates a second substitution function, and store a set of result elements of the first instruction in a SIMD destination register.