摘要:
A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
摘要:
In one embodiment, the present disclosure provides a method that includes segmenting an n-bit exponent e into a first segment et and a number t of k-bit segments ei in response to a request to determine a modular exponentiation result R, wherein R is a modular exponentiation of a generator base g for the exponent e and a q-bit modulus m, wherein the generator base g equals two and k is based at least in part on a processor configured to determine the result R; iteratively determining a respective intermediate modular exponentiation result for each segment ei, wherein the determining comprises multiplication, exponentiation and a modular reduction of at least one of a multiplication result and an exponentiation result; and generating the modular exponentiation result R=ge mod m based on, at least in part, at least one respective intermediate modular exponentiation result.
摘要翻译:在一个实施例中,本公开提供了一种方法,其包括响应于确定模幂运算结果R的请求,将n位指数e分割成第一段et和数目t的k比特段ei,其中R是 指数e的发生器基数g和q位模数m的模幂运算,其中发生器基g等于2,并且k至少部分地基于被配置为确定结果R的处理器; 迭代地确定每个段ei的相应的中间模幂运算结果,其中所述确定包括相乘结果和求幂结果中的至少一个的乘法,乘法和模块化减少; 并且至少部分地基于至少一个相应的中间模幂运算结果来产生模幂运算结果R = ge mod m。
摘要:
A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register.
摘要:
A method is described. The method includes iteratively performing for each position in a result matrix stored in a third register, multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, where the positions in the first register and the second register are determined by the position in the result matrix and performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value.
摘要:
A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner.
摘要:
A time-invariant method and apparatus for performing modular reduction that is protected against cache-based and branch-based attacks is provided. The modular reduction technique adds no performance penalty and is side-channel resistant. The side-channel resistance is provided through the use of lazy evaluation of carry bits, elimination of data-dependent branches and use of even cache accesses for all memory references.
摘要:
A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction.
摘要:
A method is described. The method includes executing an instruction to perform one or more Galois Field (GF) multiply by 2 operations on a state matrix and executing an instruction to combine results of the one or more GF multiply by 2 operations with exclusive or (XOR) functions to generate a result matrix.
摘要:
A method is described. The method includes executing one or more JH_SBOX_L instructions to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_P instructions to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed.
摘要:
A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register.