摘要:
A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.
摘要:
A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.
摘要:
A data processing apparatus and method are provided for processing data under control of a program having program instructions including sequences of individual program instructions corresponding to computational subgraphs identified within the program. Each computational subgraph has a number of input operands and produces one or more output operands. The apparatus comprises an operand store for storing the input and output operands, and processing logic for executing individual program instructions from the program. Also provided is configurable accelerator logic which, in response to reaching an execution point within the program corresponding to a sequence of individual program instructions corresponding to a computational subgraph, evaluates one or more output functions associated with the computational subgraph. The evaluation of each output function generates an output operand for storing in the operand store, and each output operand corresponds to an output that would have been generated had the sequence of individual program instructions corresponding to the computational subgraph have been executed by the processing logic. Configuration storage stores a single look-up table (LUT) configuration for each output function, and for each output function to be evaluated, the accelerator logic is configured dependent on the associated single LUT configuration from the configuration storage, such that on receipt of the input operands of the computational subgraph, the accelerator logic will generate the output operand. This technique has been found to provide a particularly efficient accelerator logic for evaluating output functions associated with computational subgraphs.
摘要:
A data processing apparatus and method are provided for processing data under control of a program having program instructions including sequences of individual program instructions corresponding to computational subgraphs identified within the program. Each computational subgraph has a number of input operands and produces one or more output operands. The apparatus comprises an operand store for storing the input and output operands, and processing logic for executing individual program instructions from the program. Also provided is configurable accelerator logic which, in response to reaching an execution point within the program corresponding to a sequence of individual program instructions corresponding to a computational subgraph, evaluates one or more output functions associated with the computational subgraph. The evaluation of each output function generates an output operand for storing in the operand store, and each output operand corresponds to an output that would have been generated had the sequence of individual program instructions corresponding to the computational subgraph have been executed by the processing logic. Configuration storage stores a single look-up table (LUT) configuration for each output function, and for each output function to be evaluated, the accelerator logic is configured dependent on the associated single LUT configuration from the configuration storage, such that on receipt of the input operands of the computational subgraph, the accelerator logic will generate the output operand. This technique has been found to provide a particularly efficient accelerator logic for evaluating output functions associated with computational subgraphs.
摘要:
An integrated circuit 2 includes a configurable accelerator 14. An instruction identifier 22 identifies subgraphs of program instructions which are capable of being performed as combined complex operations by the configurable accelerator 14. The subgraph identifier 22 reorders the sequence of fetched instructions to enable larger subgraphs of program instructions to be formed for acceleration and uses a postpone buffer 24 to store any postponed instructions which have been pushed later in the instruction stream by the reordering action of the subgraph identifier 22.
摘要:
A method and system for selectively applying one of a plurality of different memory coherence protocols are described. When an application is executed to generate a memory access transaction, a table can be evaluated to determine whether the transaction should be processed in accordance with a first memory coherence protocol or a second memory coherence protocol. Then, the transaction can be processed according to the selected memory coherence protocol. Alternatively, or in conjunction therewith, the application can be modified to execute more efficiently on a particular memory coherence protocol.
摘要:
The invention relates to a multiprocessor system on an electronic chip (300) comprising at least two computing tiles, each of the computing tiles comprising a generalist processor, and means for access to a communication network (320), the said computing tiles being connected together via the said communication network, the said multiprocessor system being characterized in that: a generalist processor using an instruction set which defines all the operations to be executed by the said processor, the generalist processors have one and the same instruction set; at least one of the computing tiles also comprises an accelerator coupled to the generalist processor accelerating computing tasks of the said generalist processor.
摘要:
A data processing system 2 includes a data store 14 having storage locations storing entries which can be used for a variety of purposes, such as operand value prediction, branch prediction, etc. An entry profile store 16 stores profile data in respect of more candidate entries than there are storage locations within the data store 14. The profile data is used to determine replacement policy for entries within the data store 14. The profile data 16 can include hash values used to determine whether or not predictions associated with candidate entries were or were not correct without having to store the full predictions within the profile data.
摘要:
A data processing system includes a data store having storage locations storing entries which can be used for a variety of purposes, such as operand value prediction, branch prediction, etc. An entry profile store stores profile data for more candidate entries than there are storage locations within the data store. The profile data is used to determine replacement policy for entries within the data store. The profile data can include hash values used to determine whether predictions associated with candidate entries were correct without having to store the full predictions within the profile data.
摘要:
An apparatus with an ultra low power architecture is described herein. The apparatus includes a first power supply rail, wherein a plurality of subsystems are to be powered by the first power supply rail. The apparatus also includes a second power supply rail, wherein a plurality of autonomous subsystems are to be powered by the power supply rail, wherein the second power supply rail is to be always on, always available, and low power.