Abstract:
Instructions and logic provide SIMD SM3 cryptographic hashing functionality. Some embodiments include a processor comprising: a decoder to decode instructions for a SIMD SM3 message expansion, specifying first and second source data operand sets, and an expansion extent. Processor execution units, responsive to the instruction, perform a number of SM3 message expansions, from the first and second source data operand sets, determined by the specified expansion extent and store the result into a SIMD destination register. Some embodiments also execute instructions for a SIMD SM3 hash round-slice portion of the hashing algorithm, from an intermediate hash value input, a source data set, and a round constant set. Processor execution units perform a set of SM3 hashing round iterations upon the source data set, applying the intermediate hash value input and the round constant set, and store a new hash value result in a SIMD destination register.
Abstract:
An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections.
Abstract:
A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport.
Abstract:
Methods and apparatus for processing bitstreams and byte streams. According to one aspect, bitstream data is compressed using coalesced string match tokens with delayed matching. A matcher is employed to perform search string match operations using a shortened maximum string length search criteria, resulting in generation of a token stream having data and literal data. A distance match operation is performed on sequentially adjacent tokens to determine if they contain the same distance data. If they do, the len values of the tokens are added through use of a coalesce buffer. Upon detection of a distance non-match, a final coalesced length of a matching string is calculated and output along with the prior matching distance as a coalesced token. Also disclosed is a scheme for writing variable-length tokens into a bitstream under which token data is input into a bit accumulator and written to memory (or cache to be subsequently written to memory) as each token is processed in a manner that eliminates branch mispredict operations associated with detecting whether the bit accumulator is full or close to full.
Abstract:
Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
Abstract:
In general, in one aspect, the disclosure describes a multiplier that includes a set of multiple multipliers configured in parallel where the set of multiple multipliers have access to a first operand and a second operand to multiply, the first operand having multiple segments and the second operand having multiple segments. The multiplier also includes logic to repeatedly supply a single segment of the second operand to each multiplier of the set of multiple multipliers and to supply multiple respective segments of the first operand to the respective ones of the set of multiple multipliers until each segment of the second operand has been supplied with each segment of the first operand. The logic shifts the output of different ones of the set of multiple multipliers based, at least in part, on the position of the respective segments within the first operand. The multiplier also includes an accumulator coupled to the logic.
Abstract:
A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
Abstract:
An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections.
Abstract:
A method of an aspect includes receiving an instruction indicating a first source having at least one set of four state matrix data elements, which represent a complete set of four inputs to a G function of a cryptographic hashing algorithm. The algorithm uses a sixteen data element state matrix, and alternates between updating data elements in columns and diagonals. The instruction also indicates a second source having data elements that represent message and constant data. In response to the instruction, a result is stored in a destination indicated by the instruction. The result includes updated state matrix data elements including at least one set of four updated state matrix data elements. Each of the four updated state matrix data elements represents a corresponding one of the four state matrix data elements of the first source, which has been updated by the G function.
Abstract:
A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.