摘要:
A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.
摘要:
A data processing apparatus has an instruction memory system arranged to output an instruction word, capable of containing a plurality of instructions, respective instruction words being output in response to respective instruction addresses. An instruction execution unit contains a plurality of functional units, each capable of executing a respective instruction from the instruction word in parallel with execution of other instructions from the instruction word by other ones of the functional units. A power saving circuit is provided to switch a selectable subset of the functional units and/or parts of the instruction memory to a power saving state, while other functional units and parts of the instruction memory continue processing instructions in a normal power consuming state. The power saving circuit selects the functional units and/or parts of the instruction memory dependent on program execution.
摘要:
A program of instruction words is executed with a VLIW data processing apparatus. The apparatus comprises a plurality of functional units capable of executing a plurality of instructions from each instruction word in parallel. The instructions from each of at least some of the instruction words are fetched from respective memory units in parallel, addressed with an instruction address that is common for the functional units. Translation of the instruction address into a physical address can be modified for one or more particular ones of the memory units. Modification is controlled by modification update instructions in the program. Thus, it can be selected dependent on program execution which instructions from the memory units will be combined into the instruction word in response to the instruction address.
摘要:
An array of data values, such as an image of pixel values, is stored in a main memory (12). A processing operation is performed using the pixel values. The processing operation defines time points of movement of a multidimensional region (20, 22) of locations in the image. Pixel values from inside and around the region are cached for processing. At least when a cache miss occurs for a pixel value from outside the region, cache replacement of data in cache locations (142) is performed. Locations that store pixel data for locations in the image outside the region (20, 22) are selected for replacement, selectively exempting from replacement cache locations (142) that store pixel data locations in the image inside the region. In embodiments, different types of cache structure are used for caching data values inside and outside the region. In an embodiment the cache locations for pixel data inside the regions support a higher level of output parallelism than the cache locations for pixel data around the region. In a further embodiment the cache for locations inside the region contains sets of banks, each set for a respective line from the image, data from the lines being distributed in a cyclically repeating fashion over the banks.
摘要:
An array of data values, such as an image of pixel values, is stored in a main memory (12). A processing operation is performed using the pixel values. The processing operation defines time points of movement of a multidimensional region (20, 22) of locations in the image. Pixel values from inside and around the region are cached for processing. At least when a cache miss occurs for a pixel value from outside the region, cache replacement of data in cache locations (142) is performed. Locations that store pixel data for locations in the image outside the region (20, 22) are selected for replacement, selectively exempting from replacement cache locations (142) that store pixel data locations in the image inside the region. In embodiments, different types of cache structure are used for caching data values inside and outside the region. In an embodiment the cache locations for pixel data inside the regions support a higher level of output parallelism than the cache locations for pixel data around the region. In a further embodiment the cache for locations inside the region contains sets of banks, each set for a respective line from the image, data from the lines being distributed in a cyclically repeating fashion over the banks.
摘要:
The present invention relates to a data processing device (10) comprising a processing unit (12) and a memory unit (14), and to a method for controlling operation of a memory unit (14) of a data processing device. The memory unit (14) comprises a main memory (16), a low- level cache memory (20.2), which is directly connected to the processing unit (12) and adapted to hold all pixels of a currently active sliding search area for reading access by the processing unit (12), a high-level cache memory (18), which is connected between the low-level cache memory and the frame memory, and a first pre-fetch buffer (20.1), which is connected between the high-level cache memory and the low- level cache memory and which is adapted to hold one search-area column or one search-area line of pixel blocks, depending on the scan direction and scan order followed by the processing unit. Reading and fetching functionalities are decoupled in the memory unit (14). The fetching functionality is concentrated on the higher cache level, while the reading functionality is concentrated on the lower cache level. This way concurrent reading and fetching can be achieved, thus enhancing the performance of a data processing device.
摘要:
The present invention relates to a data processing device (10) comprising a processing unit (12) and a memory unit (14), and to a method for controlling operation of a memory unit (14) of a data processing device. The memory unit (14) comprises a main memory (16), a low- level cache memory (20.2), which is directly connected to the processing unit (12) and adapted to hold all pixels of a currently active sliding search area for reading access by the processing unit (12), a high-level cache memory (18), which is connected between the low-level cache memory and the frame memory, and a first pre-fetch buffer (20.1), which is connected between the high-level cache memory and the low- level cache memory and which is adapted to hold one search-area column or one search-area line of pixel blocks, depending on the scan direction and scan Reading and fetching functionalities are decoupled in the memory unit (14). The fetching functionality is concentrated on the higher cache level, while the reading functionality is concentrated on the lower cache level. This way concurrent reading and fetching can be achieved, thus enhancing the performance of a data processing device.