Abstract:
The data processing circuit has a self-timed instruction execution unit, which operates asynchronously, signalling the completion of processes and starting subsequent processes in response to such signalling. In order to satisfy real time constraints upon program execution ready signals generated after completion of selected instructions are gated with a timer signal before they are used to start a next instruction. In an embodiment, the amount of time left between the ready signal is used to start a next instruction is measured and used to regulate a power supply voltage of the instruction execution unit so that it is just high enough to make the instruction execution unit sufficiently fast to meet the real time constraints.
Abstract:
A DSP including a register file connected to data memories and functional units is provided. Functional units read operands from the register file and store results into the register file. Various register storage locations from communicative links between the functional units and the memories, in accordance with a particular code sequence being executed by the DSP. Because each functional unit has an independent path to the register file, each functional unit may provide results to the register file concurrently. Additionally, having multiple register storage locations which are accessible to any functional unit permits flexibility in the operation of the DSP. Multiple register storage locations may be used by the same functional unit, allowing program code to be more optimized by storing values for later use in one of the register storage locations, as opposed to storing values in the data memories. The register file essentially provides a buffer between the functional units, and between the functional units and memory.
Abstract:
For manufacturing a mobile communication terminal, reduction of the cost, power consumption, and size is a very important factor, and it is a major problem for the conventional technique in which two independent sets of DSPs and CPUs are used, because two systems of external memories are required. Further, since two systems of peripheral devices for data input/output are necessary for the DSPs and CPUs, there exists useless overhead between the DSPs and CPUs. A mobile communication terminal system is realized by using an integrated DSP/CPU chip having a DSP/CPU core (500) integrated as a single bus master, an integrated external bus interface (606) and an integrated peripheral circuit interface. Therefore, an inexpensive, low-power consumption, small-sized mobile communication terminal system is provided because the memory systems and peripheral circuits of the DSPs and CPUs are integrated.
Abstract:
Variable-length instructions are prepared for simultaneous decoding and execution of a plurality of instructions in parallel by reading multiple variable-length instructions from an instruction source and determining the starting point of each instruction so that multiple instructions are presented to a decoder simultaneously for decoding in parallel. Immediately upon accessing the multiple variable-length instructions from an instruction memory, a predecoder derives predecode information for each byte of the variable-length instructions by determining an instruction length indication for that byte, assuming each byte to be an opcode byte since the actual opcode byte is not identified. The predecoder associates an instruction length to each instruction byte. The instructions and predecode information are applied to an instruction buffer circuit in a memory-aligned format. The instruction buffer circuit prepares the variable-length instructions for decoding by converting the instruction alignment from a memory alignment to an instruction alignment on the basis of the instruction length indication. The instruction buffer circuit also assists the preparation of variable-length instructions for decoding of multiple instructions in parallel by facilitating a conversion of the instruction length indication to an instruction pointer.
Abstract:
A microcontroller or processor architecture that performs word aligned multibyte fetches but allows byte aligned instructions. Jump target addresses are word aligned, resulting in a word aligned fetch of the jump-to instruction. An assembler or compiler loads code into an instruction memory with branch instruction target addresses aligned on word boundaries. Returns from interrupts load the program counter with the complete return address which is byte aligned.
Abstract:
A management system for memory resident computer programs is disclosed which serves to provide compatibility between memory resident programs, TSR's, written for the DOS operating system and the Windows graphical user interface whereby graphical images are generated and displayed to the user running the Windows graphical user interface under the command of a DOS TSR and the user's input data may be communicated back to the DOS TSR in response to the image displayed. The present invention is comprised of a DOS TSR and a Windows TSR Manager which allocates memory addressable by both the DOS TSR and the Windows TSR Manager such that a communication channel independent of the DOS and Windows user interfaces is established and where the Windows TSR Manager further includes a Windows TSR Library Handler and one or more Windows TSR Libraries one for each DOS TSR supported by the present management system which serves to generate graphical images compatible with the Windows graphical user interface.
Abstract:
Single-instruction multiple-data is a new class of integrated video signal processors especially suited for real-time processing of two-dimensional images. The single-instruction, multiple-data architecture is adopted to exploit the high degree of parallelism inherent in many video signal processing algorithms. Features have been added to the architecture which support conditional execution and sequencing - an inherent limitation of traditional single-instruction multiple-data machines. A separate transfer engine offloads transaction processing from the execution core, allowing balancing of input/output and compute resources - a critical factor in optimizing performance for video processing. These features, coupled with a scalable architecture allow a united programming model and application driven performance.
Abstract:
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output adress; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Abstract:
A data processing system contains both a scalar processor (102) and a vector processor (104). The vector processor (104) contains a plurality of functional units (108), each of which contains a plurality of parallel pipelines, each of the pipelines contains a plurality of arithmetic and logic units (ALUs) connected via a plurality of data paths, such that data can be communicated between the ALUs during the execution of a vector instruction by the vector functional unit containing the pipeline. The operation performed by each of the cascaded ALUs and the paths through which data is to be communicated between the ALUs during the execution of a vector instruction can be controlled by configuration values held in a scalar register (105) named by the vector instruction. Through the use of this technique, multiple operations upon sets of vector data may be specified in a single short vector instruction, and further, the configuration of the pipelines can be determined dynamically in response to program input.
Abstract:
A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to dispatch the variable byte-length instructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. With the information conveyed by the functional bits, the decode units can detect the exact locations of the opcode, displacement, immediate, register, and scale-index bytes. Accordingly, no serial scan by the decode units through the instruction bytes is needed. In addition, the functional bits allow the decode units to calculate linear addresses (via adder circuits) expeditiously for use by other subunits within the superscalar microprocessor. Accordingly, relatively fast decoding may be attained, and high performance may be accommodated.