摘要:
A processor (100) is provided that is a programmable fixed point digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. The processor includes an instruction buffer unit (106), a program flow control unit (108), an address/data flow unit (110), a data computation unit (112), and multiple interconnecting busses. Dual multiply-accumulate blocks improve processing performance. A memory interface unit (104) provides parallel access to data and instruction memories. The instruction buffer is operable to buffer single and compound instructions pending execution thereof. A decode mechanism is configured to decode instructions from the instruction buffer. The use of compound instructions enables effective use of the bandwidth available within the processor. A soft dual memory instruction can be compiled from separate first and second programmed memory instructions. Instructions can be conditionally executed or repeatedly executed. Bit field processing and various addressing modes, such as circular buffer addressing, further support execution of DSP algorithms. The processor includes a multistage execution pipeline with pipeline protection features. Various functional modules can be separately powered down to conserve power. The processor includes emulation and code debugging facilities with support for cache analysis.
摘要:
A processing engine 10 includes an instruction buffer 502 operable to buffer single and compound instructions pending execution. A decode mechanism is configured to decode instructions from the instruction buffer. The decode mechanism is arranged to respond to a predetermined tag in a tag field of an instruction, which predetermined tag is representative of the instruction being a compound instruction formed from separate programmed memory instructions. The decode mechanism is operable in response to the predetermined tag to decode at least first data flow control for a first programmed instruction and second data flow control for a second programmed instruction. The use of compound instructions enables effective use of the bandwidth available within the processing engine. A soft dual memory instruction can be compiled from separate first and second programmed memory instructions. A compound address field of the predetermined compound instruction can be arranged at the same bit positions as the address field for a hard compound memory instruction, that is a compound instruction which is programmed. In this case the decoding of the addresses can be started before the operation code of the instructions have been decoded. To reduce the number of bits in the compound instruction, addressing can be restricted to indirect addressing and the operation codes for at least the first instruction can be reduced in size. In this way, the compound instruction can be arranged to have the same number of bits in total as the sum of the bits of the separate programmed instructions.
摘要:
Data processing apparatus 10 supporting circular buffers CB includes address storage ARx for holding a virtual buffer index and offset storage BOFxx for holding an offset address. Circular buffer management logic 802 is configured to be operable to apply a modifier to a virtual buffer index held in the address storage to derive a modified virtual buffer index and to apply a buffer offset held in the offset storage to the modified virtual buffer index to derive a physical address for addressing a circular buffer. By employing virtual addressing to a buffer index for a circular buffer management, it is possible to make efficient use of memory resources. One or more circular buffers can be located contiguously with respect to each other and/or other data in memory, avoiding fragmentation of the memory. The buffer index forms a pointer for the circular buffer. The apparatus enables circular buffers to be implemented without alignment constraints, while maintaining compatibility with prior circular buffer implementations with alignment constraints.
摘要:
A processing engine 10 for executing instructions in parallel comprises an instruction buffer 600 for holding at least two instructions, with the first instruction 602 in a first position and the second instruction 604 in a second position. A first decoder 612 provides decoding of the first instruction and generates first control signals. The first control signals include first resource control signals, first address generation control signals, and a first validity signal indicative of the validity of the first instruction in the first position. A second decoder 614 provides decoding of the second instruction and generates second control signals. The second control signals include second resource control signals, second address generation control signals, and a second validity signal indicative of the validity of the second instruction in the second position. Arbitration and merge logic 628, 630 is provided for arbitrating between the first and second control signals and for merging the first and second control signals for controlling power of execution of the instructions in accordance with a set of parallelism rules. A conditional execution unit 634 is responsive to false condition signals from the arbitration and merge logic to inhibit or modify the effect of the control signals. The parallelism rules provide for efficient instruction execution, and the avoidance of resource conflicts.
摘要:
A processor is provided that is a programmable digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. A coefficient data pointer is provided for accessing coefficient data for use in a multiply-accumulate (MAC) unit. Monitoring circuitry determines when the coefficient data pointer is modified (step 1104). When an instruction is executed (step 1102) that requires a coefficient datum from memory in accordance with the coefficient data pointer, a memory access is inhibited (step 1108) if the coefficient data pointer has not been modified since the last time a memory fetch was made in accordance with the coefficient data pointer and the previously fetched coefficient datum is reused. However, if the coefficient data pointer was modified since the last time a memory fetch was made in accordance with the coefficient data pointer, then the required coefficient datum is fetched from memory (step 1106). A shadow register within the MAC unit execution pipeline temporarily saves coefficient data for possible reuse.
摘要:
A processing engine 10 provides computation of an output vector as a linear combination of N input vectors with N coefficients in an efficient manner. The processing engine includes a coefficient register 940 for holding a representation of each of N coefficients of a first input vector. A test unit 950 is provided for testing selected parts (e.g. bits) of the coefficient register for respective coefficient representations. An arithmetic unit 970 computes respective coordinates of an output vector by selective addition/subtraction of coordinates of a second input vector dependent on results of the coefficient representation tests. Power consumption can be kept low due to the use of a coefficient test operation in parallel with an ALU operation. Each coordinate of an output vector {right arrow over (Y)} can be computed with a N+1 step algorithm, the computation being done with bit test unit operating in parallel with an ALU according to the following equation: ∀ 1 ≤ j ≤ M Y j = ∑ 1 ≤ i ≤ N ( ( - 1 ) C i * X ij ) . At a step (i+1)1≦i≦N of the computation, a bit Ci+1 of the CPU register is addressed, this bit is tested in a temporary register and a conditional addition/subtraction of a coordinate of the second input vector Xij is performed.
摘要:
In a method for tracing data within an integrated circuit, a default time stamp granularity is selected for a sequence of time stamps, wherein each time stamp has a resolution of 2**N. A sequence of trace events is captured and an elapsed time is determined between each time sequential pair of trace events in the sequence of trace events. A time stamp is formed to associate with each trace event of the sequence of trace events, wherein each time stamp has an associated time stamp granularity, wherein the time stamp has the default time stamp granularity if the elapsed time between a current trace event and a sequentially prior trace event is less than 2**N time slots, otherwise the time stamp granularity is slid to a larger value such that the elapsed time can be represented by N bits, whereby a small number N of bits can accurately represent a large range of elapsed times.
摘要:
In a method for monitoring power consumption by a system within an integrated circuit, one or more software programs are executed on the system on a chip (SOC). While the program executes, power control settings of a plurality of functional units within the SOC may be adjusted in response to executing the one or more software programs, whereby power consumption within the SOC varies over time. The power control settings may be changed in response to explicit directions from the executing software, or may occur autonomously in response to load monitoring control modules within the SOC. A sequence of power states is reported for the plurality of functional units within the SOC. Each of the sequence of power states may include clock frequencies from multiple clock domains, voltage levels for multiple voltage domains, initiator activity, target activity, memory module power enablement, or power enablement of each of the plurality of functional units.
摘要:
A system comprising a system under test (SUT) having a control logic. The SUT further comprises testing logic coupled to the SUT and adapted to provide to the SUT a clock signal to facilitate communications between the testing logic and the SUT. The control logic monitors a number of activated processors in a scan chain coupled to the control logic. If the number of activated processors is reduced, the control logic dynamically decreases a frequency of the clock signal.
摘要:
A microprocessor and a method of operating the microprocessor are provided in which a portion of the microprocessor is partitioned into a plurality of partitions. A sequence of instructions is executed within an instruction pipeline of the microprocessor. A block of instructions within the sequence of instructions is repetitively executed in response to a local repeat instruction. Either prior to executing the block of instructions, or during the first iteration of the loop, a determination is made that at least one of the plurality of partitions is not needed to execute the block of instructions. Operation of the at least one identified partition is inhibited during the repetitive execution of the block of instructions in order to reduce power dissipation.