摘要:
A plurality of processing units for performing data processing operations require access to data in shared memory. Each has an associated cache storing a subset of the data for access by that processing unit. A cache coherency protocol ensures data accessed by each unit is up-to-date. Each unit issues a write access request when outputting a data value for storing in shared memory. When the write access request requires both the associated cache and the shared memory to be updated, a coherency operation is initiated within the cache coherency logic. The coherency operation is performed for all of the caches including the cache associated with the processing unit that issued the write access request in order to ensure that the data in those caches is kept coherent.
摘要:
A data processing apparatus has processing circuitry for performing processing operations including high priority operations and low priority operations, events occurring during performance of those processing operations. Prediction circuitry includes a history storage having a plurality of counter entries for storing count values, and index circuitry for identifying, dependent on the received event, at least one counter entry and for causing the history storage to output the count value stored in that at least one counter entry, with the prediction data being derived from the output count value. Update control circuitry modifies at least one count value stored in the history storage in response to update data generated by the processing circuitry. The update control circuitry has a priority dependent modification mechanism such that the modification is dependent on the priority of the processing operation with which that update data is associated.
摘要:
A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.
摘要:
A data processing apparatus and method are provided for detecting cache misses. The data processing apparatus has processing logic for executing a plurality of program threads, and a cache for storing data values for access by the processing logic. When access to a data value is required while executing a first program thread, the processing logic issues an access request specifying an address in memory associated with that data value, and the cache is responsive to the address to perform a lookup procedure to determine whether the data value is stored in the cache. Indication logic is provided which in response to an address portion of the address provides an indication as to whether the data value is stored in the cache, this indication being produced before a result of the lookup procedure is available, and the indication logic only issuing an indication that the data value is not stored in the cache if that indication is guaranteed to be correct. Control logic is then provided which, if the indication indicates that the data value is not stored in the cache, uses that indication to control a process having an effect on a program thread other than the first program thread.
摘要:
A data processing apparatus is provided having processing logic for performing a sequence of operations, and a cache having a plurality of segments for storing data values for access by the processing logic. The processing logic is arranged, when access to a data value is required, to issue an access request specifying an address in memory associated with that data value, and the cache is responsive to the address to perform a lookup procedure during which it is determined whether the data value is stored in the cache. Indication logic is provided which, in response to an address portion of the address, provides for each of at least a subject of the segments an indication as to whether the data value is stored in that segment. The indication logic has guardian storage for storing guarding data, and hash logic for performing a hash operation on the address portion in order to reference the guarding data to determine each indication. Each indication indicates whether the data value is either definitely not stored in the associated segment or is potentially stored with the associated segment, and the cache is then operable to use the indications produced by the indication logic to affect the lookup procedure performed in respect of any segment whose associated indication indicates that the data value is definitely not stored in that segment. This technique has been found to provide a particularly power efficient mechanism for accessing the cache.
摘要:
A data processing apparatus and method are provided for managing multiple program threads executed by processing circuitry. The multiple program threads include at least one high priority program thread and at least one lower priority program thread. At least one storage unit is shared between the multiple program threads and has multiple entries for storing information for reference by the processing circuitry when executing the program threads. Thread control circuitry is used to detect a condition indicating an adverse effect caused by a lower priority program thread being executed by the processing circuitry and resulting from sharing of the at least one storage unit between the multiple program threads. On detection of such a condition, the thread control circuitry issues an alert signal, and a scheduler is then responsive to the alert signal to cause execution of the lower priority program thread causing the adverse effect to be temporarily halted, for example by causing that lower priority program thread to be de-allocated and an alternative lower priority program thread allocated in its place. This has been found to provide a particularly efficient mechanism for allowing any high priority program thread to progress as much as possible, whilst at the same time improving the overall processor throughput by seeking to find co-operative lower priority program threads.
摘要:
There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
摘要:
A data processing system includes an instruction fetching circuit 2, an instruction queue 4 and further processing circuits 6. A branch target cache, which maybe a branch target address cache 8, a branch target instruction cache 10 or both, is used to store branch target addresses or blocks of instructions starting at the branch target respectively. A control circuit 12 is responsive to the contents of the instruction queue 4 when a branch instruction is encountered to determine whether or not storage resources within the branch target cache 8, 10 should be allocated to that branch instruction. Storage resources within the branch target cache 8, 10 will be allocated when the number of program instructions within the instruction queue is below a threshold number and/or the estimated execution time of the program instructions is below a threshold time.
摘要:
A branch prediction mechanism 16, 18 within a multithreaded processor having hardware scheduling logic 6, 8, 10, 12 uses a shared global history table 18 which is indexed by respective branch history registers 20, 22 for each program thread. Different mappings are used between preceding branch behaviour and the prediction value stored within respective branch history registers 20, 22. These different mappings may be provided by inverters placed into the shift in paths for the branch history registers 20, 22 or by adders 40, 42 or in some other way. The different mappings help to equalise the probability of use of the particular storage locations within the global history table 18 such that the plurality of program threads are not competing excessively for the same storage locations corresponding to the more commonly occurring patterns of preceding branch behaviour.
摘要:
Within a coherent multi-processing system multiple processor cores 4, 6 are coupled via respective memory buses to a memory access control unit 16. The memory buses are formed of a uni-processing portion containing signals specifying a memory access request in accordance with a uni-processing protocol. This uni-processing bus is augmented by a multi-processing bus containing signals giving additional information concerning memory access requests which may be used by the memory access control unit to service those requests and manage coherency within the system.