Abstract:
A threaded interpreter (916) is suitable for executing a program comprising a series of program instructions stored in a memory (904). For the execution of a program instruction the threaded interpreter includes: a preparatory unit (918) for executing a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution unit (920) with one or more machine instructions emulating the program instruction. According to the invention, the threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions, machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.
Abstract:
A microcontroller apparatus is provided with an instruction set for manipulating the behavior of the microcontroller. The apparatus and system is provided that enables a linearized address space that makes modular emulation possible. Direct or indirect addressing is possible through register files or data memory. Special function registers, including the Program Counter (PC) and Working Register (W), are mapped in the data memory. An orthogonal (symmetrical) instruction set makes possible any operation on any register using any addressing mode. Consequently, two file registers to be used in some two operand instructions. This allows data to be moved directly between two registers without going through the W register. Thus increasing performance and decreasing program memory usage.
Abstract:
A threaded interpreter (916) is suitable for executing a program comprising a series of program instructions stored in a memory (904). For the execution of a program instruction the threaded interpreter includes: a preparatory unit (918) for executing a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution unit (920) with one or more machine instructions emulating the program instruction. According to the invention, the threaded interpreter is designed such that during the execution on an instruction-level parallel processor of the series of program instructions, machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.
Abstract:
A branch operation is processed using a branch predict instruction (300) and an associated branch instruction. The branch predict instruction (300) indicates a predicted direction (310), a target address (320), and an instruction address (330) for the associated branch instruction. When the branch predict instruction (300) is detected, the target address (330) is stored at an entry indicated by the associated branch instruction address and a prefetch request is triggered to the target address (330). The branch predict instruction (300) may also include hint information (340) for managing the storage and use of the branch prediction information.
Abstract:
An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction. For such an embodiment, reassigning the rename register as described above may thereby provide a valid value for the stack pointer register, and hence may allow for the generation of lookahead stack pointer values for instruction subsequent to the move instruction to proceed prior to execution of the move instruction. The present embodiment of the register rename unit may also assign the destination rename register selected for the move instruction to the source register of the move instruction (i.e. the rename tags for the source and destination are "swapped").
Abstract:
Methods and apparatus for implementing and using a sign(x) function are described. In accordance with the present invention, the sign(x) function is implemented in hardware, e.g., by incoroporating a simple circuit of the present invention into a central processing unit (CPU). The hardware required to implement the sign(x) function in accordance with the present invention is relatively simple and allows for the sign(x) function to be determined in a single processor clock cycle. A processor sign(x) command is supported in embodiments where the hardware for performing the sign(x) function is incoporated into a processor. By incorporating a single sign(x) circuit into a processor a SISD sign(x) function can be supported. By duplicating the basic sign(x) hardware within a processor, in accordance with the present invention, a SIMD sign(x) function can be implemented.
Abstract:
A micro-architectural method increases the performance of microprocessor and digital circuit designs by increasing the usable instruction-level parallelism during execution. The method can be applied to substantially increase the performance of processors in a broad range of instruction sets including CISC, RISC, and EPIC designs. Code blocks of instructions are transformed from the original instruction set architecture to a new instruction set architecture by an instruction stream transformation unit (102). The transformed code blocks are then cached in an instruction cache (104). The process increases processor performance by substantially increasing the instruction-level parallelism available during execution by an execution unit (100).
Abstract:
A controller for a digital processor includes a random access memory, e.g., an instruction memory, that consumes significant power when operating. To reduce the power consumption when repetitive instructions, i.e. loops, are being performed, the loop instructions are stored in and accessed from a shift register rather than from the random access memory without any special instructions defining the loop. A memory control includes a state tracking machine that monitors the execution of the program instructions and determines therefrom when a loop has been entered, whereupon it enables the shift register to produce the loop instructions stored therein and disables the instruction memory from producing instructions until the loop is exited. The foregoing process is automatically initiated for each loop, whether the loop is a new loop, a loop within a loop or a multiple loop. The present controller does not require special instructions either preceding or following a loop to specify the start or end points of the loop, or the number of instructions in the loop, or the number of times the loop is to be performed; but rather it determines the presence of a loop automatically from the executable micro-code instructions that execute the loop.
Abstract:
A circuit for digital signal processing calls for the use of a variable length instruction set. An exemplary DSP includes a set of three data buses (108, 110, 112) over which data may be exchanged with a register bank (120) and three data memories (102, 103, 104). A register bank (120) may be used that has registers accessible by at least two processing units (128, 130). An instruction fetch unit (156) may be included that receives instructions of variable length stored in an instruction memory (152). The instruction memory (152) may be separate from the set of three data memories (102, 103, 104).
Abstract:
In a computer system the instruction decoding unit for translating program instructions to microcode instructions operates dynamically. Thus the unit receives state signals indicating the state of the computer, such as a trace enabling signal (63), influencing the translation process in the instruction decoding unit. These state signals (63) are added to the operation code (65) of the program instruction to be decoded, the operation code of the program instruction thus being extended and used as input to a translating table (55), the extended operation code of the program instruction being taken as an address of a field in the table. The addresses and thus the contents of the fields addressed for the same operation code of a program instruction can then be different for different values of the state signals. Thus generally, the state signals cause the instruction decoder to change its translating algorithm so that the decoder can decode an operation code differently depending on the state which the signals adopt. The dynamic decoding can for a trace enabling signal be used for switching on and off a trace function. In the normal case, when tracing is not desired, no microinstructions supporting the trace function have to be executed and thereby the performance and in particular the speed of the computer system will be increased.