摘要:
In a multi-tasking computing system environment, one program is halted and context switched out so that a processor may context switch in a subsequent program for execution. Processor state information exists which reflects the state of the program being context switched out. Storage of this processor state information permits successful resumption of the context switched out program. When the context switched out program is subsequently context switched in, the stored processor information is loaded in preparation for successfully resuming the program at the point in which execution was previously halted. Although, large areas of memory can be allocated to processor state information storage, only a portion of this may need to be preserved across a context switch for successfully saving and resuming the context switched out program. Unnecessarily saving and loading all available processor state information can be noticeably inefficient particularly where relatively large amounts of processor state information exists. In one embodiment, a processor requests a co-processor to context switch out the currently executing program. At a predetermined appropriate point in the executing program, the co-processor responds by halting program execution and saving only the minimal amount of processor state information necessary for successful restoration of the program. The appropriate point is chosen by the application programmer at a location in the executing program that requires preserving a minimal portion of the processor information across a context switch. By saving only a minimal amount of processor information, processor time savings are accumulated across context save and restoration operations.
摘要:
A vector processor includes two banks of vector registers where each vector register can stored multiple data elements and a control register with a field indicating a default bank. An instruction set for the vector processor includes instructions which use a register number to identify a vector registers in the default bank, uses a register number to identify a double-size vector register including a register from the first bank and a register from the second bank, and instructions which include a bank bit and a register number to access a vector register from either bank.
摘要:
A cache system supports a re-sizable software-managed fast scratch pad that is implemented as a cache-slice. A processor register indicates the size and base address of the scratch pad. Instructions which facilitate use of the scratch pad include a prefetch instruction which loads multiple lines of data from external memory into the scratch pad and a writeback instruction which writes multiple lines of data from the scratch pad to external memory. The prefetch and writeback instructions are non-blocking instructions to allow instructions following in the program order to be executed while a prefetch or writeback operation is pending.
摘要:
An integrated multiprocessor architecture simplifies synchronization of multiple processing units. The multiple processing units constitute a general-purpose or control processor and a vector processor which has a single-instruction-multiple-data (SIMD) architecture so that multiple parallel processing units in the vector processor all complete an instruction simultaneously and do not require software synchronization. The control control processor controls the vector processor and creates a fork in a program flow by starting the vector processor. An instruction set for the control processor includes special instructions that enable the control processor to access registers of the vector processor, start or halt execution by the vector processor, and test flags written by the vector processor to indicate completion of tasks. The two processors then execute separate program threads in parallel until the control processor stops the vector processor, an exception is encountered, or the vector processor completes its program thread and enters an idle state. An instruction set for the vector processor includes special instructions that interrupt the first processor to indicate a task is complete. A register coupled to and accessible by both processors stores a state bit indicating whether the vector processor is running or idle. The control processor can synchronize the separate program threads by executing a loop which polls the state bit. When the state bit indicates the vector processor is idle, the general-purpose processor can process results from the vector processor and restart the vector processor.
摘要:
The present invention generally relates to multiply-accumulate units for use in digital signal processors. Each multiply-accumulate unit includes a multiply unit which is coupled with two or more dedicated accumulators. Because of the coupling configuration, when an instruction specifies which accumulator should be used in executing an operation, the instruction need not specify which multiply unit should be utilized. A scheduler containing a digital signal processor's coupling configuration may then identify the multiply unit associated with the accumulator and may then forward the instruction to the identified multiply unit. Multiply-accumulate units can be configured to execute both scalar and vector operations. For executing vector operations, multiply units and their coupled accumulators are configured such that each may be easily grouped with other multiply units and accumulators.
摘要:
A multiprocessor architectural definition provides that a program executing on a first processor interrupts a second processor by executing a software interrupt instruction. The software interrupt instruction includes an argument field for passing information from a program requesting the software interrupt. The argument, along with the opcode, is saved in a register designated for holding the argument. The information communicated via the argument is used in one embodiment to indicate a cause of the interrupt. In an embodiment, the information communicated via the argument designates an interrupt service routine to be activated in the interrupted processor.
摘要:
The present invention generally relates to a hybrid VLIW-SIMD programming model for a digital signal processor. The hybrid programming model broadcasts a packet of information to a plurality of functional units or processing elements. Each packet contains several instructions having certain characteristics, such as instruction type and instruction length, among others. The hybrid programming model includes functional units which are reconfigurable based upon the instructions with an instruction packet and the availability of the functional units. The model groups the functional units such that the operations specified in the instructions can be efficiently executed and selects which functional units should be utilized for a given operation.
摘要:
The present invention provides an efficient method of forwarding and sharing information between functional units and register files in an effort to execute instructions. A digital signal processor includes a plurality of register blocks for storing data operands coupled to a plurality of data path units for executing instructions. Preferably, each register block is coupled to at least two data path units. In addition, the processor preferably has a plurality of forwarding paths which forward information from one data path unit to another. A scheduler efficiently forwards instructions to data path units based on information regarding the configuration of the processor and any restrictions which might be imposed on the scheduler.
摘要:
In one exemplary embodiment, the disclosed VLIW processor comprises a number of threads where each thread includes a processing unit. For example, there can be two threads, where each of the two threads has its own processing unit. According to this exemplary embodiment, a number of VLIW packets are divided into a number of issue groups. As an example, two VLIW packets are divided into two issue groups each. The first issue group in the first VLIW packet is provided to a first thread for execution in the first thread processing unit during a first clock cycle. Concurrently, the first issue group in the second VLIW packet is provided to a second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the first clock cycle. Moreover, the second issue group in the first VLIW packet is provided to the first thread for execution in the first thread processing unit during a second clock cycle. Concurrently, the second issue group in the second VLIW packet is provided to the second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the second clock cycle. In this manner, various resources of the VLIW processor are efficiently utilized and two VLIW packets are executed during two clock cycles. As such, the processing speed of the VLIW processor is doubled without a significant increase in the power consumed by the VLIW processor.
摘要:
In one disclosed embodiment an instruction loop having at least one instruction is identified. For example, each instruction can be a VLIW packet comprised of several individual instructions. The instructions of the instruction loop are fetched from a program memory. The instructions are then stored in a register queue. For example, the register queue can be implemented with a head pointer which is adjusted to select a register in which to write each instruction that is fetched. It is then determined whether the processor requires execution of the instruction loop, for example, by checking a program counter (PC) value corresponding to each instruction. When the processor requires execution of the instruction loop, the instructions are output from the register queue. For example, the register queue can be implemented with an access pointer which is adjusted to select a register from which to output each instruction that is required.