摘要:
A data processing circuit comprises an instruction execution circuit (14) and a plurality of memory banks. The instruction execution circuit (14) is capable of processing blocks of data values (e.g. pixel values for a two-dimensional block of pixels) in parallel. The data values are stored (preferably cached) in the memory banks and supplied in parallel. A plurality of translation circuits (22) is coupled between block addressing outputs of the instruction execution circuits and address inputs of the memory banks. The translation circuits provide for the possibilty of addressing more than one block in parallel from different memory banks. The data is routed to the execution circuit from the selected memory banks by routing circuits. In an embodiment each translation circuit is able to address all memory of the banks. In another embodiment the translation circuits support a plurality of ways of distributing a data of a pixel image over the memory banks, using only a few banks for example for data that is accessed in small blocks and more banks for data that accessed with higher parallelism.
摘要:
An array of data values, such as an image of pixel values, is stored in a main memory (12). A processing operation is performed using the pixel values. The processing operation defines time points of movement of a multidimensional region (20, 22) of locations in the image. Pixel values from inside and around the region are cached for processing. At least when a cache miss occurs for a pixel value from outside the region, cache replacement of data in cache locations (142) is performed. Locations that store pixel data for locations in the image outside the region (20, 22) are selected for replacement, selectively exempting from replacement cache locations (142) that store pixel data locations in the image inside the region. In embodiments, different types of cache structure are used for caching data values inside and outside the region. In an embodiment the cache locations for pixel data inside the regions support a higher level of output parallelism than the cache locations for pixel data around the region. In a further embodiment the cache for locations inside the region contains sets of banks, each set for a respective line from the image, data from the lines being distributed in a cyclically repeating fashion over the banks.
摘要:
An instruction processing device has a of pipe-line stage with a functional unit for executing a command from an instruction. A first register unit is coupled to the functional unit for storing a result of execution of the command when the command has reached a first one of the pipeline stages, and for supplying bypass operand data to the functional unit. A register file is coupled to the functional unit for storing the result when the command has reached a second one of the pipeline stages, downstream from the first one of the pipeline stages, and for supplying operand data to the functional unit. A disable circuit is coupled to selectively disable storing of the results in the register file under control of the instructions.
摘要:
A programmable data processing circuit has a memory for storing pixel values, or more generally data values as a function of position in a signal. The programmable data processing circuit supports instructions that include an indication of a selected parameter value set that indicates how a plurality of data values must be arranged for parallel output from a memory. Instructions that indicate different parameter value sets can be executed intermixed with one another. The programmable data processing circuit responds to instructions of this type by retrieving the selected parameter value sets from a parameter storage circuit (246), and controlling a switching circuit (22) between a memory port (21) of a memory circuit (20) and a data port (26) at least partly dependent on the selected parameter value set.
摘要:
A programmable data processing circuit has a memory for storing pixel values, or more generally data values as a function of position in a signal. The programmable data processing circuit supports instructions that include an indication of a selected parameter value set that indicates how a plurality of data values must be arranged for parallel output from a memory. Instructions that indicate different parameter value sets can be executed intermixed with one another. The programmable data processing circuit responds to instructions of this type by retrieving the selected parameter value sets from a parameter storage circuit (246), and controlling a switching circuit (22) between a memory port (21) of a memory circuit (20) and a data port (26) at least partly dependent on the selected parameter value set.
摘要:
A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.
摘要:
The data processing device has a plurality of functional units and issues instructions in successive instruction cycles. Instructions of a first type are each intended for one functional unit at a time. An instruction of a second type causes a combination of functional units to respond in the same instruction execution cycle, a result from one functional unit being used by another as part of the execution of the same instruction. Preferably, the device supports alternative operation at a number of different instruction cycle rates, dependent on whether an executed program segment contains instructions of the second type. The fastest instruction cycle rate does not allow execution of the instruction of the second type, because operation by different functional units does not fit within the instruction execution cycle. When possible, the device saves power by switching to a slower clock rate, in which case instructions of the second type are executed to save additional power, by reducing the number of instructions that have to be issued.
摘要:
A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.
摘要:
The data processing device has a plurality of functional units and issues instructions in successive instruction cycles. Instructions of a first type are each intended for one functional unit at a time. An instruction of a second type causes a combination of functional units to respond in the same instruction execution cycle, a result from one functional unit being used by another as part of the execution of the same instruction. Preferably, the device supports alternative operation at a number of different instruction cycle rates, dependent on whether an executed program segment contains instructions of the second type. The fastest instruction cycle rate does not allow execution of the instruction of the second type, because operation by different functional units does not fit within the instruction execution cycle. When possible, the device saves power by switching to a slower clock rate, in which case instructions of the second type are executed to save additional power, by reducing the number of instructions that have to be issued.
摘要:
A data processing apparatus has an instruction memory system arranged to output an instruction word, capable of containing a plurality of instructions, respective instruction words being output in response to respective instruction addresses. An instruction execution unit contains a plurality of functional units, each capable of executing a respective instruction from the instruction word in parallel with execution of other instructions from the instruction word by other ones of the functional units. A power saving circuit is provided to switch a selectable subset of the functional units and/or parts of the instruction memory to a power saving state, while other functional units and parts of the instruction memory continue processing instructions in a normal power consuming state. The power saving circuit selects the functional units and/or parts of the instruction memory dependent on program execution.