Abstract:
A data processing apparatus comprising a processor for executing a data processing process and a processor for executing a tuning process is disclosed. The data processing apparatus is arranged such that the tuning process which is a different process to the data processing process can access the parameters of speculative mechanisms of the data processing process and tune the parameters so that the mechanisms speculate differently and in this way the performance of this data processing process can be improved.
Abstract:
A parallel processing technique is described for performing parallel processing operations upon N-dimensional arrays of data elements for which a corresponding N- dimensional Scoreboard of status data is held. Hazard checking for data dependencies upon data elements within the N-dimensional array of data elements is performed by looking up the corresponding status value within the Scoreboard. The status data for a given data element within the Scoreboard is located at a position which can be derived from the position of the data elements within its N-dimensional array. Thus, a two- dimensional array of video macroblocks can have a corresponding two-dimensional Scoreboard of status data indicating whether individual macroblocks have, for example, either already been deblocked or have not already been deblocked.
Abstract:
A diagnostic method is described for generating diagnostic data relating to processing of an instruction stream, wherein said instruction stream has been compiled from a source instruction stream to include multiple threads, said method comprising the steps of: (i) initiating a diagnostic procedure in which at least a portion of said instruction stream is executed; (ii) controlling a scheduling order for executing instructions within said at least a portion of said instruction stream to cause execution of a sequence of thread portions, said sequence being determined in response to one or more rules, at least one of said rules defining an order of execution of said thread portions to follow an order of said source instruction stream. In this way, the diagnostic method can generate a debug view of a parallelised program which is the same as, or at least similar to, a debug view which would be provided when debugging the original non-parallelised program.
Abstract:
A data processing system (2) is provided including a local memory (4) and a main memory (6). The local memory (4) is accessed by a data engine (8) using local- memory physical addresses. The main memory (6) is accessed by a microprocessor (10) using main-memory addresses. A translation store (16) serves to store physical address TAGs indicating the mapping between data stored within the local memory (4) and corresponding data stored within the main memory (6). A coherency management mechanism (18) serves to use MESI coherency control data to manage the coherency between data values stored both in the local memory (4) and the main memory (6).
Abstract:
A data processing apparatus and method of handling conditional instructions in such a data processing apparatus are provided. The data processing apparatus has a pipelined processing unit for executing instructions including at least one conditional instruction from a set of conditional instructions, and a register file having a plurality of registers operable to store data values for access by the pipelined processing unit when executing the instructions. A register specified by an instruction may be either a source register holding a source data value for that instruction or a destination register into which is stored a result data value generated by execution of that instruction. The register file has a predetermined number of read ports via which data values can be read from registers of the register file. The pipelined processing unit is operable when executing the at least one conditional instruction to produce a result data value which, dependent on the existence of the condition specified by that conditional instruction, represents either the result of the computation specified by that conditional instruction or a current data value stored in the destination register for that conditional instruction. Further, each conditional instruction in the set is constrained to specify a register that is both a source register and a destination register for that conditional instruction, thereby reducing the minimum number of read ports required to support execution of that conditional instruction by the pipelined processing unit.
Abstract:
A data processing apparatus and methods are disclosed. The data processing apparatus comprises: data processing elements operable to process data; an. energy management unit operable to generate energy management information indicative of an energy state of at least one of the data processing elements when processing said data; and logic operable to receive said energy management information and to generate energy management information items associating said energy state with the processing of said data. The information items can provide visibility of how the Energy State of the data processing elements vary in response to the processing of data. Providing this visibility of the Energy State can advantageously enable more detailed the energy management to be performed and the Energy State of the data processing elements to be optimized.
Abstract:
A data processing apparatus comprising a processor for executing a data processing process and a processor for executing a tuning process is disclosed. The data processing apparatus is arranged such that the tuning process which is a different process to the data processing process can access the parameters of speculative mechanisms of the data processing process and tune the parameters so that the mechanisms speculate differently and in this way the performance of this data processing process can be improved.
Abstract:
An asymmetric multiprocessor apparatus (2) is provided in which respective slave diagnostic units (20, 22, 24) are associated with corresponding execution mechanisms (6, 8, 10). A master, diagnostic unit (26) tracks the migration of thread execution between the different execution mechanisms (6, 8, 10) so that the execution of a given thread can be followed by the diagnostic mechanisms (20, 22, 24, 26) and this information provided to the programmer. The execution mechanisms (6, 8, 10) can be diverse such as a general purpose processor (6), a DMA unit (12), a coprocessor, an VLIW processor, a digital signal processor (8) and a hardware accelerator (10). The asymmetric multiprocessor apparatus (2) will also typically include an asymmetric memory hierarchy such as including two or more of a global memory, a shared memory (16), a private memory (18) and a cache memory (14).
Abstract:
A data processing apparatus is provided having processing logic for performing a sequence of operations, and a cache having a plurality of segments for storing data values for access by the processing logic. The processing logic is arranged, when access to a data value is required, to issue an access request specifying an address in memory associated with that data value, and the cache is responsive to the address to perform a lookup procedure during which it is determined whether the data value is stored in the cache. Indication logic is provided which, in response to an address portion of the address, provides for each of at least a subject of the segments an indication as to whether the data value is stored in that segment. The indication logic has guardian storage for storing guarding data, and hash logic for performing a hash operation on the address portion in order to reference the guarding data to determine each indication. Each indication indicates whether the data value is either definitely not stored in the associated segment or is potentially stored with the associated segment, and the cache is then operable to use the indications produced by the indication logic to affect the lookup procedure performed in respect of any segment whose associated indication indicates that the data value is definitely not stored in that segment. This technique has been found to provide a particularly power efficient mechanism for accessing the cache.
Abstract:
An integrated circuit, and method of reviewing values of one or more signals occurring within that integrated circuit, are provided. The integrated circuit comprises processing logic for executing a program, and monitoring logic for reviewing values of one or more signals occurring within the integrated circuit as a result of execution of the program. The monitoring logic stores configuration data, which can be software programmed in relation to the signals to be monitored. Further, the monitoring logic makes use of a Bloom filter which, for a value to be reviewed, performs a hash operation on that value in order to reference the configuration data to determine whether that value is either definitely not a value within the range or is potentially a value within the range of values. If the value is determined to be within the set of values, then a trigger signal is generated which can be used to trigger a further monitoring process.