摘要:
There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
摘要:
There is provided an apparatus for processing data under control of a program having program instructions and subgraph suggestion information identifying respective sequences of program instructions corresponding to computational subgraphs identified within said program, said apparatus comprising: a memory operable to store a program formed of separate program instructions; processing logic operable to execute respective separate program instructions from said program; and accelerator logic operable in response to reaching an execution point within said program associated with a subgraph suggestion to execute a sequence of program instructions corresponding to said subgraph suggestion as an accelerated operation instead of executing said sequence of program instructions as respective separate program instructions with said processing logic.
摘要:
An accelerator 120 is tightly coupled to the normal execution unit 110. The operand store, which could be a register file 130, a stack based operand store or other operand store is shared by the execution unit and the accelerator unit. Operands may also be accessed as immediate values within the instructions themselves. The sequences of individual program instructions corresponding to computational subgraphs remain within a program but can be recognized by the accelerator as suitable for acceleration and when encountered are executed by the accelerator instead of by the normal execution unit. Within such tightly coupled arrangement problems can arise due to a lack of register resources within the system. The present technique provides that at least some intermediate operand values which are generated within the accelerator, but are determined not to be referenced outside of the computational subgraph concerned, are not written to the operand store.
摘要:
There is provided an information processor for executing a program comprising a plurality of separate program instructions: processing logic operable to individually execute said separate program instructions of said program; an operand store operable to store operand values; and an accelerator having an array comprising a plurality of functional units, said accelerator being operable to execute a combined operation corresponding to a computational subgraph of said separate program instructions by configuring individual ones of said plurality of functional units to perform particular processing operations associated with one or more processing stages of said combined operation; wherein said accelerator executes said combined operation in dependence upon operand mapping data providing a mapping between operands of said combined operation and storage locations within said operand store and in dependence upon separately specified configuration data providing a mapping between said plurality of functional units and said particular processing operations such that said configuration data can be re-used for different operand mappings.
摘要:
A branch prediction mechanism within a pipelined processing apparatus uses a history value HV which records preceding branch outcomes in either a first mode or a second mode. In the first mode respective bits within the history value represent a mixture of branch taken and branch not taken outcomes. In the second mode a count value within the history value indicates a count of a contiguous sequence of branch taken outcomes.
摘要:
An apparatus and method for loading data values from a memory system are provided. The data processing apparatus comprises a data processing unit operable to execute instructions, and a register file having a plurality of registers operable to store data values accessible by the data processing unit when executing the instructions. Further, a holding register is provided which does not form one of a working set of registers of the register file, and is operable to temporarily store a data value, the holding register having a data portion for storing the data value, and an identifier portion operable to store identifier data associated with the data value. The data processing unit is then responsive to a preload instruction to issue a preload memory access request to a memory system to cause a data value identified by the preload instruction to be located in the memory system, and dependent on predetermined criteria to cause a copy of that data value along with associated identifier data to be loaded from the memory system into the holding register. Furthermore, the data processing unit is responsive to a load instruction to cause a comparison operation to be performed to determine whether identifier data derived from the load instruction matches the identifier data in the identifier portion of the holding register. If it does, the data value stored in the holding register is made available to the data processing unit without requiring a memory access request to be issued to the memory system. Only in the event of there being no match does the memory access request get issued to the memory system to cause a data value identified by the load instruction to be made available to the data processing unit from the memory system.
摘要:
A data processing apparatus and method for generating access requests is provided. A bus master is provided which can operate either in a secure domain or a non-secure domain of the data processing apparatus, according to a signal received from external to the bus master. The signal is generated to be fixed during normal operation of the bus master. Control logic is provided which, when the bus master device is operating in a secure domain, is operable to generate a domain specifying signal associated with an access request generated by the bus master core indicating either secure or non-secure access, in dependence on either a default memory map or securely defined memory region descriptors. Thus, the bus master operating in a secure domain can generate both secure and non-secure accesses, without itself being able to switch between secure and non-secure operation.
摘要:
The present invention provides a data processing apparatus and method for handling corrupted data values. The method comprises the steps of: a) accessing a data value in a memory within a data processing apparatus; b) initiating processing of the data value within the data processing apparatus; c) whilst at least one of the steps a) and b) are being performed, determining whether the data value accessed is corrupted; and d) when it is determined that the data value is corrupted, disabling an interface used to propagate data values between the data processing apparatus and a device coupled to the data processing apparatus to prevent propagation of a corrupted data value to the device. When a data value is accessed, the data processing apparatus can begin processing of that data value and, hence, the performance of the data processing apparatus is not reduced. If it is determined that the data value which was accessed was corrupted or contains an error then the interface which couples the data processing apparatus with the device is disabled. Disabling the interface effectively quarantines any corrupted data values by preventing them from being propagated to the device. Preventing corrupted data values from being propagated to the device ensures that no change in state can occur in the device as a result of the corrupted data values.
摘要:
Within a coherent multi-processing system multiple processor cores 4, 6 are coupled via respective memory buses to a memory access control unit 16. The memory buses are formed of a uni-processing portion containing signals specifying a memory access request in accordance with a uni-processing protocol. This uni-processing bus is augmented by a multi-processing bus containing signals giving additional information concerning memory access requests which may be used by the memory access control unit to service those requests and manage coherency within the system.
摘要:
A data processing apparatus and method are provided for handling write access requests to shared memory. The data processing apparatus has a plurality of processing units for performing data processing operations requiring access to data in shared memory, with each processing unit having a cache associated therewith for storing a subset of the data for access by that processing unit. Cache coherency logic is provided that employs a cache coherency protocol to ensure data accessed by each processing unit is up-to-date. Each processing unit will issue a write access request when outputting a data value for storing in the shared memory, and when the write access request is of a type requiring both the associated cache and the shared memory to be updated, a coherency operation is initiated within the cache coherency logic. The coherency operation is then performed in respect of all of the caches associated with the plurality of processing units, including the cache associated with the processing unit that issued the write access request, in order to ensure that the data in those caches is kept coherent. The cache coherency logic is further operable to issue an update request to the shared memory in respect of the data value the subject of the write access request. Such a technique provides a particularly simple and efficient mechanism for ensuring the correct behaviour of such write access requests, without impacting the complexity and access timing of the originating processing unit and its associated cache.