摘要:
Based on an input index sequence (702) composed of four indices (each having a bit width of 8 bits), a shift-copier generates an index sequence (902) by shifting each index leftward by 1 bit and making two copies of each index, and outputs the generated index sequence (902). An adder generates a shuffle pattern (703) by adding 1, 0, 1, 0, 1, 0, 1 and 0 to the indices in the index sequence (902) from left to right, and outputs the generated shuffle pattern (703).
摘要:
A dedicated hardware block is provided for implementing crossbars and/or barrel shifters in programmable logic resources. Crossbar and/or barrel shifter circuitry may replace one or more rows, one or more columns, one or more rectangles, or any combination thereof of programmable logic regions on a programmable logic resource. The functionality of the crossbar and/or barrel shifter circuitry can further be improved by implementing time-multiplexing.
摘要:
A crossbar is implemented within multimedia facilities of a processor to perform vector permute operations, in which the bytes of a source operand are reordered in the target output. The crossbar is then reused for other instructions requiring multiplexing or shifting operations, particularly those in which the size of additional multiplexers or the size and delay of a barrel shifter is significant. A wide shift operation, for example, may be performed with one cycle latency by the crossbar and one additional layer of multiplexers or a small barrel shifter. The crossbar facility thus gets reused with improved performance of the instructions now sharing the crossbar and a reduction in the total area required by a multimedia facility within a processor.
摘要:
A symmetric filter arithmetic apparatus includes a first data shuffling unit which reads a first data string that is a plurality of consecutive pieces of data from a register file and extract, from the first data string, a left-side data string that is a plurality of consecutive pieces of data to be multiplied by a left-side filter coefficient that is a filter coefficient on a left side of a center of the coefficients, and a second data shuffling unit which reads a second data string that is a plurality of consecutive pieces of data from the register file and extract, from the second data string, a right-side data string that is a plurality of consecutive pieces of data to be multiplied by a right-side filter coefficient that is a filter coefficient on a right side of the center and is the same value as the left-side filter coefficient.
摘要:
A method of providing wiring efficiency in a permute unit. Multiple selectors receive input data and shared control signals from multiple register files. The permute unit includes multiple multiplexors (MUXs) coupled to multiple logical AND gates. The multiple logical AND gates are coupled to multiple logical OR gates. The logical AND gates are physically separated from the logical OR gates. The logical AND gates receive input from one or more output data signals from the selectors. The logical OR gates combine the one or more output signals from the logical AND gates and provide output data from the permute unit.
摘要:
Accommodating a processor to process a number of different data formats includes loading a data word in a first format from a first storage device; reordering, before it reaches the arithmetic unit, the first format of the data word to a second format compatible with the native order of the arithmetic unit; and vector processing the data word in the arithmetic unit.
摘要:
A crossbar (20) circuit with multiplexer (22A, 22B) circuits implemented in a polygonal form on a chip. The crossbar can be used for implementing a permutation of input bits (24A, 24B) controlled by a bit vector (25). Horizontal and vertical wiring lengths in the crossbar (20) are reduced by stacking the operand latches (24A, 24B, 25) and horizontal or vertical multiplexers (22A, 22B). This implementation decreases the latency of the crossbar and avoids latches to store intermediated results, thus reducing area and power consumption.
摘要:
A crossbar is implemented within multimedia facilities of a processor to perform vector permute operations, in which the bytes of a source operand are reordered in the target output. The crossbar is then reused for other instructions requiring multiplexing or shifting operations, particularly those in which the size of additional multiplexers or the size and delay of a barrel shifter is significant. A wide shift operation, for example, may be performed with one cycle latency by the crossbar and one additional layer of multiplexers or a small barrel shifter. The crossbar facility thus gets reused with improved performance of the instructions now sharing the crossbar and a reduction in the total area required by a multimedia facility within a processor.
摘要:
A processor-implemented method with homomorphic encryption includes: receiving a first ciphertext corresponding to a first modulus; generating a second ciphertext corresponding to a second modulus by performing modulus raising on the first ciphertext; and performing bootstrapping by encoding the second ciphertext using a commutative property and an associative property of operations included in a rotation operation.
摘要:
Generating a random permutation by arranging a sequence N numbers in a matrix, performing random arrangement operations on the rows of the matrix to generate an intermediary matrix, performing random arrangement operations on the columns of the intermediary matrix to generate a second intermediary matrix, and arranging the N numbers of the second intermediary matrix as a rearranged sequence of the N numbers.