摘要:
An execution unit adapted to perform at least a portion of the Data Encryption Standard. The execution unit includes a Left Half input; a Key input; and a Table input. The execution unit also includes a first group of transistors configured to receive the Table input, perform a table look-up, and output data. The execution unit further includes a first exclusive-or operator having two inputs and an output. The first exclusive-or operator is configured to receive the Left Half input and the Key input. The execution unit also includes a second exclusive-or operator having two inputs and an output. The second exclusive-or operator is configured to receive the data output by the first group of transistors and to receive the output of the first exclusive-or operator. The execution unit also includes a third exclusive-or operator having two inputs and an output. The third exclusive-or operator is configured to receive the Left Half input and the data output by the first group of transistors.
摘要:
During a method, a modulus circuit determines a modulus base p of a first number and a modulus base p of a second number. Also, the modulus circuit performs the operation using the modulus base p of the first number and the modulus base p of the second number, and calculates a modulus base p of the result of the operation involving the first number and the second number. Next, the modulus circuit compares the result of the operation carried out on the modulus base p of the first number and the modulus base p of the second number with the modulus base p of the operation performed on the first number and the second number to identify potential errors associated with the operation. Moreover, the modulus circuit repeats the method to identify additional potential errors associated with the operation, where the determining and calculating operations are repeated using moduli base q.
摘要:
A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
摘要翻译:包括单个未融合融合的浮点乘法(FMA)模块的计算机处理器计算用于融合乘法加法运算和未加密乘法加法运算的浮点数的操作A * B + C的结果。 在一个实施例中,利用额外的硬件来增加融合乘法加法实现,其中计算未加密的乘法加法结果,而不增加额外的流水线级。 在一个实施例中,使用单个操作码来启动由融合未分配的浮点乘法(FMA)模块进行的计算,该操作码确定是否生成融合乘法加法结果或未合并的乘法加法结果。
摘要:
One embodiment of the present invention provides a system that uses the Newton-Raphson technique to perform a division operation. During operation, the system receives a numerator a and a denominator b. The system then divides a by b by first using the Newton-Raphson technique to calculate 1/b, and then multiplying 1/b by a to produce the result a/b. While using Newton-Raphson technique to find 1/b, the system first obtains an initial estimate x0 for 1/b and then iteratively solves the equation xi+1=xi(2−bxi). Each iteration involves: (1) using a multiplier circuit to multiply b by xi to compute bxi; (2) performing a bit-wise complement operation on bxi to compute 2−bxi, whereby an additional pass through an adder circuit or a multiply/add circuit is not required to perform the subtraction operation. (3) The system then uses the multiplier circuit to multiply xi by 2−bxi to compute xi(2−bxi).
摘要:
A multiply execution unit that can generate the integer product of a multiplicand and a multiplier and is also operable to generate the XOR product of the multiplicand and the multiplier. The multiply execution unit includes a summing circuit for summing a plurality of partial products. The summing circuit includes a plurality of rows. The summing circuit can generate an integer sum of the plurality of partial products and can generate an XOR sum of the plurality of partial products. The summing circuit includes a plurality of compressors in the first row of the summing circuit. The plurality of compressors each has more than three inputs that receive data, a carry output, and a sum output.
摘要:
A computer processor including a single fused-unfused floating point multiply-add (FMA) module computes the result of the operation A*B+C for floating point numbers for fused multiply-add rounding operations and unfused multiply-add rounding operations. In one embodiment, a fused multiply-add rounding implementation is augmented with additional hardware which calculates an unfused multiply-add rounding result without adding additional pipeline stages. In one embodiment, a computation by the fused-unfused floating point multiply-add (FMA) module is initiated using a single opcode which determines whether a fused multiply-add rounding result or unfused multiply-add rounding result is generated.
摘要翻译:包括单个未融合融合的浮点乘法(FMA)模块的计算机处理器计算用于融合乘法加法运算和未加密乘法加法运算的浮点数的操作A * B + C的结果。 在一个实施例中,利用额外的硬件来增加融合乘法加法实现,其中计算未加密的乘法加法结果,而不增加额外的流水线级。 在一个实施例中,使用单个操作码来启动由融合未分配的浮点乘法(FMA)模块进行的计算,该操作码确定是否生成融合乘法加法结果或未合并的乘法加法结果。
摘要:
A computer system for computing a binary operation involving a first term multiplied by a second term resulting in a product, where the product is conditionally added to a third term in a central processing unit. The central processing unit includes a carry save adder configured to add a plurality of partial products obtained from the product of the first term and the second term to obtain a first partial result and a second partial result, a multiplexer configured to output one selected from the group consisting of the second term, the third term, and zero, and an alignment shifter configured to shift an output of the multiplexer to align the output of the multiplexer with the first partial result and the second partial result to obtain a shifted term. The shifted term, the first partial result and the second partial result are added together to obtain a result of the binary operation.
摘要:
During a method, a modulus circuit determines a modulus base p of a first number and a modulus base p of a second number. Also, the modulus circuit performs the operation using the modulus base p of the first number and the modulus base p of the second number, and calculates a modulus base p of the result of the operation involving the first number and the second number. Next, the modulus circuit compares the result of the operation carried out on the modulus base p of the first number and the modulus base p of the second number with the modulus base p of the operation performed on the first number and the second number to identify potential errors associated with the operation. Moreover, the modulus circuit repeats the method to identify additional potential errors associated with the operation, where the determining and calculating operations are repeated using moduli base q.
摘要:
One embodiment of the present invention provides a system that uses the Newton-Raphson technique to compute a square-root. During operation, the system receives a radicand b. Next, the system calculates the square root of b, √{square root over (b)}, by first using the Newton-Raphson technique to find 1/√{square root over (b)}, and then multiplying 1/√{square root over (b)} by b to produce √{square root over (b)}. While using the Newton-Raphson technique to find 1/√{square root over (b)}, the system first obtains an initial estimate x0 for 1/√{square root over (b)} and then iteratively solves the equation x i + 1 = x i ( 3 - bx i 2 2 ) . Each iteration involves: (1) using a multiplier circuit twice to compute bxi2; (2) performing a bit-wise complement operation on bxi2, shifting the result, and modifying the first two bits of the result to compute 3 - bx i 2 2 , whereby an additional pass through an adder circuit or a multiply/add circuit is not required to perform the subtraction operation; and finally (3) using the multiplier circuit to multiply xi by 3 - bx i 2 2 to compute x i ( 3 - bx i 2 2 ) .
摘要:
A system and method is provided for minimizing read-only data retrieval time and/or area through the use of combinatorial logic. In one embodiment of the present invention, two address bits are provided to a binary logic function device. The binary logic function device uses the two address bits and predetermined logic functions (i.e., functions that represent a plurality of read-only data values) to produce a binary value—which is the requested read-only data. In another embodiment, the binary values produced by the binary logic function device are provided to at least one multiplexer. The at least one multiplexer uses at least a portion of the remaining bits (i.e., the address bits not being provided to the binary logic function device) to select (or narrow down) which binary values may be the read-only data requested. If the output of the at least one multiplexer contains more than one binary value, then those values are provided to at least one other multiplexer. The at least one other multiplexer uses the remainder of the remaining bits to select which binary value is the read-only data requested.