摘要:
Reduction of the complexity of a Viterbi-type sequence detector is disclosed. It was based on elimination of less probably taken branches in the trellis. The method is applied to the design of the E2PR4 channel with 8/9 rate sliding block trellis code. Coding, by itself eliminates two states by coding constraints, and the disclosed method reduces the number of required ACS units from 14 to 11, while reducing their complexity as well. For the implementation of E2PR4 detection, 4 4-way, 3 3-way, 3 2-way and one 1-way ACSs are needed. System simulations show no BER performance drop at common SNRs when compared with a full 16-state E2PR4 implementation in magnetic disk drives.
摘要:
Methods and apparatus for an encryption processor for performing accelerated computations to establish secure network sessions. The encryption processor includes an execution unit and a decode unit. The execution unit is configured to execute Montgomery operations and including at least one adder and at least two multipliers. The decode unit is configured to determine if a square operation or a product operation needs to be performed and to issue the appropriate instructions so that certain multiply and/or addition operations are performed in parallel in the execution unit while performing either the Montgomery square or Montgomery product operation.
摘要:
The present invention relates to a method of context switching from a first task to a second task in a data processing unit having a register file with a plurality of general purpose registers and a context switch register, a memory comprising a previous context save area and an unused context save area. The memory is coupled with the register file and an instruction control unit with a program counter register and a program status word register coupled with the memory and the register file. The method comprises the steps of acquiring a new save area from said unused save area, storing the context of the first task in said new area, linking the new area with said previous context save area.
摘要:
A floating point instruction control mechanism which processes loads and stores in parallel with arithmetic instructions. This results from register renaming, which removes output dependencies in the instruction control mechanism and allows computations aliased to the same register to proceed in parallel.
摘要:
A register selection mechanism for an instruction prefetch buffer which allows instructions having different lengths to be accessed on the instruction boundaries. The instruction prefetch buffer comprises a one-port-write, two-port-read array (10). Address generation and control logic (16) is responsive to a read pointer (15) for controlling access to odd and oven addresses in the array. Additional logic may be provided to provide an indication that the instruction prefetch buffer is empty.
摘要:
A precharge circuit for a cascode voltage switch in which at the beginning of the precharge phase the output state is memorized and the output is isolated from the precharging points. Both the positive and negative ends of the discharge paths are precharged with the gates of the switches in all paths held in their memorized states. Towards the end of precharging, the output is reconnected to the normal precharging point so that it goes low. Then the positive and negative precharging points are reconnected for their evaluation configuration.
摘要:
A pipelined data path architecture for use, in one embodiment, in a multimedia processor. The data path architecture requires a maximum of two execution pipestages to perform all instructions including wide data format multiply instructions and specially adapted multimedia instructions, such as the sum of absolute differences (SABD) instruction and other multiply with add (MADD) instructions. The data path architecture includes two wide data format input registers that feed four partitioned 32×32 multiplier circuits. Within two pipestages, the multiply circuit can perform one 128×128 multiply operation, four 32×32 multiply operations, eight 16×16 multiply operations or sixteen 8×8 multiply operations in parallel. The multiply circuit contains a compressor tree which generates a 256-bit sum and a 256-bit carry vector. These vectors are supplied to four 64-bit carry propagate adder circuits which generate the multiply results. When the data path architecture is performing specially adapted multimedia instructions the input registers are supplied to a pipelined logic unit containing adders, subtractors, shifters, average/round/absolute value circuits, and other logic operation circuits, compressor circuits and multiplexers. The output of the pipelined logic unit is then fed to the four 64-bit carry propagate adder circuits. In this way, the adder circuits of the multiply operation can be effectively used to also process the specially adapted multimedia instructions thereby saving IC area. Multiply circuitry is disabled to save power when the data path architecture is not processing a multiplication instruction.
摘要:
A partitioned shift right logic circuit that is programmable and contains rounding support. The circuit of the present invention accepts a 32-bit value and a shift amount and then performs a right shift operation on the 32-bits and automatically rounds the result(s). Signed or unsigned values can be accepted. The right shift circuit is partitioned so that the 32-bit value can represent: (1) a single 32-bit number; or (2) two 16-bit values. A 1 bit selection input indicates the particular partition format. In operation, if the input value is not negative, then one (“1”) is added at the guard bit position and a right shift with truncate is performed. If the input is negative and the guard bit is zero, then no addition is done and a right shift with truncate is performed. If the input is negative and the guard bit is one and the sticky bit is zero, then no addition is done and a right shift with truncate is performed. If the input is negative and the guard bit is one and the sticky bit is one, then one is added at the guard bit position and a right shift with truncate is performed. The shift circuitry used by the present invention is fully partitioned to accept word or half-word input and contains multiple cascaded multiplexer stages for performing partitioned right shifting and supports signed shifting. Each multiplexer stage can be programmed to perform a selected shift amount (including 0 shift). The right shift circuit of the present invention can be used in multi-media applications and can also be used for general purpose and VLIW (very long instruction word) processor without performance degradation.
摘要:
An instruction prefetch buffer control (20) is provided for an instruction prefetch buffer array (10) which stores the code for a number of instructions that have already been executed as well as the code for a number of instructions yet to be executed. The instruction prefetch buffer control includes a register (201) for storing an instruction fetch pointer, this pointer being supplied to the buffer array (10) as a write pointer which points to the location in the array where a new word is to be written from main memory. A second register (205) stores an instruction execution pointer which is supplied to the buffer array (10) as a read pointer. A first adder (203) increments the first register to increment the instruction fetch pointer for sequential instructions and calculates a new instruction fetch pointer for branch instructions. A second adder (215) increments the second register to increment the instruction execution pointer for sequential instructions and calculates a new instruction execution pointer for branch instructions. Incrementing of the second register is variable depending on the length of the instruction. A third adder ( 221) is responsive to the output of the first adder and a branch target address to calculate whether the target instruction is contained in the array (10) and, if it is, causes the new instruction execution pointer calculated by the second adder (215) to be loaded into the second register (205).
摘要:
A partitioned multiplier circuit which is designed for high speed operations. The multiplier of the present invention can perform one 32×32 bit multiplication, two 16×16 bit multiplications (simultaneously) or four 8×8 bit multiplications (simultaneously) depending on input partitioning signals. The time required to perform either the 32×32 bit or the 16×16 bit or the 8×8 bit multiplications is constant. Therefore, multiplication results are available with a constant latency regardless of operand bit-size. In one embodiment, the latency is two clock cycles but the multiplier circuit has a throughput of one clock cycle due to pipelining. The input operands can be signed or unsigned. The hardware is partitioned without any significant increase in the delay or area and the multiplier can provide six different modes of operation. In one embodiment, Booth encoding is used for the generation of 17 partial products which are compressed using a compression tree into two 64-bit values. This is performed in the first pipeline stage to generate a sum and a carry vector. These values are then added, in the second pipestage, using a carry propagate adder circuit to provide a single 64-bit result. In the case of 16×16 bit multiplication, the 64-bit result contains two 32-bit results. In the case of 8×8 bit multiplication, the 64-bit result contains four 16-bit results. Due to its high operating speed, the multiplier circuit is advantageous for use in multi-media applications, such as audio/visual rendering and playback.