摘要:
In general, in one aspect, the disclosure describes a system including multiple programmable processing units, a dedicated hardware multiplier, and at least one bus connecting the multiple processing units and multiplier.
摘要:
In general, in one aspect, the disclosure describes a processing unit that includes a memory, an arithmetic logic unit, and control logic having access to program instructions of a control store. The control logic includes logic to access multiple sets of variables, variables in the different sets of variables being identically referenced by instructions, associate a one of the sets of variables as the current set of variables to be used in instructions that are executed by the arithmetic logic unit, change the set of variables associated with the current set of variables in response to a procedure call or exit, and alter the value of a variable of a set of the variables other than the set of variables associated with the current set of variables in response to an instruction.
摘要:
The disclosure includes description of a processor component that includes a set of register bits to perform a shift register operation. The component window detection logic can detect a window of bits in the set of register bits and, in response to detecting the window, output the window of bits.
摘要:
In general, in one aspect, the disclosure describes a processing unit that includes a datapath having an input buffer, at least one memory, and an arithmetic logic unit, and control logic having access to a program instruction control store. The control logic controls operation of the datapath and may concurrently cause the datapath to operate in response to different instructions that use different sections of the datapath, wherein the different sections of the datapath comprise a first section transferring data from an input buffer to the memory and a second section transferring data from the memory to the arithmetic logic unit.
摘要:
An area efficient multiplier having high performance at modest clock speeds is presented. The performance of the multiplier is based on optimal choice of a number of levels of Karatsuba decomposition. The multiplier may be used to perform efficient modular reduction of large numbers greater than the size of the multiplier.
摘要:
In general, in one aspect, the disclosure describes a processing unit that includes an input buffer to store data received by the processing unit, a memory, an arithmetic logic unit coupled to the input buffer and to the memory, an output buffer; and control logic having access to a control store of program instructions, the control logic to process instructions including an instruction to transfer data from the input buffer to the memory and an instruction to cause the arithmetic logic unit to perform an operation on operands provided by at least one of the memory and the input buffer, the instruction to output results of the operation to at least one of the memory and the output buffer.
摘要:
In general, in aspect, the disclosure describes a system integrated on a single die that includes a first processor core to receive commands from at least one other processor core to perform at least one specified transformative operation on specified data, multiple processing units to perform transformative operations on data, a shared memory, and logic to transfer data between a one of the set of multiple processing units and the shared memory.
摘要:
The present disclosure provides a system and method for performing modular exponentiation. The method includes loading a first word of a vector from memory into a first register and subsequently loading the first word from the first register to a second register. The method may also include loading a second word into the first register and loading at least one bit from the second register into an arithmetic logic unit. The method may further include performing modular exponentiation on the at least one bit to generate a result and generating a public key based upon, at least in part, the result. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要:
An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
摘要:
In one embodiment, circuitry is provided to generate a residue based at least in part upon operations and a data stream generated based at least in part upon a packet. The operations may include at least one iteration of at least one reduction operation including (a) multiplying a first value with at least one portion of the data stream, and (b) producing a reduction by adding at least one other portion of the data stream to a result of the multiplying. The operations may include at least one other reduction operation including (c) producing another result by multiplying with a second value at least one portion of another stream based at least in part upon the reduction, (d) producing a third value by adding at least one other portion of the another stream to the another result, and (e) producing the residue by performing a Barrett reduction based at least in part upon the third value.