摘要:
An apparatus and method are provided for speculatively vectorising program code. The apparatus includes processing circuitry for executing program code, the program code including an identified code region comprising at least a plurality of speculative vector memory access instructions. Execution of each speculative vector memory access instruction is employed to perform speculative vectorisation of a series of scalar memory access operations using a plurality of lanes of processing. Tracking storage is used to maintain, for each speculative vector memory access instruction, tracking information providing an indication of a memory address being accessed within each lane. Checking circuitry then references the tracking information during execution of the identified code region by the processing circuitry, in order to detect any inter lane memory hazard resulting from the execution of the plurality of speculative vector memory access instructions.
摘要:
An apparatus, method and computer program, the apparatus comprising processing circuitry to execute instructions, issue circuitry to issue the instructions for execution by the processing circuitry, and candidate instruction storage circuitry to store a plurality of condition-dependent instructions, each specifying at least one condition. The issue circuitry is configured to issue a given condition-dependent instruction in response to a determination or a prediction of the at least one condition specified by the given condition-dependent instruction being met, and when the given condition-dependent instruction is a sequence-start instruction, the issue circuitry is responsive to the determination or prediction to issue a sequence of instructions comprising the sequence-start instruction and at least one subsequent instruction.
摘要:
Processing circuitry supports a first type of vector arithmetic instruction specifying at least a first input vector. When at least one exceptional condition is detected for an arithmetic operation performed for a first active data element of the first input vector in a predetermined sequence, the processing circuitry performs at least one response action. When the at least one exceptional condition is detected for a given active data element other than the first active data element in the predetermined sequence, the processing circuitry suppresses the at least one response action and stores elements identifying information identifying which data element is the given active data element which triggered the exceptional condition. This can be useful for reducing the amount of hardware resource for tracking the occurrence of the exceptional conditions and/or supporting speculative execution of vector instructions.
摘要:
A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.
摘要:
Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.
摘要:
A data processing apparatus has processing circuitry for processing vector operands from a vector register store in response to vector micro-operations, some of which have control information identifying which data elements of the vector operands are selected for processing. Control circuitry detects vector micro-operations for which the control information specifies that a portion of the vector operand to be processed has no selected elements. If this is the case, then the control circuitry controls the processing circuitry to process a lower latency replacement micro-operation instead of the original micro-operation. This provides better performance than if a branch instruction is used to bypass the micro-operation if there are no selected elements.
摘要:
There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).
摘要:
One or more triggered-instruction processing elements are provided, a given triggered-instruction processing element comprising execution circuitry to execute processing operations in response to instructions according to a triggered instruction architecture. Input channel processing circuitry receives a given tagged data item (comprising a data value and a tag value) for a given input channel, and in response controls enqueuing of the data value of the given tagged data item to a selected buffer structure selected from among at least two buffer structures mapped onto register storage accessible to one or more of the triggered-instruction processing elements in response to a computation instruction for controlling performance of a computation operation. The selected buffer structure is selected based at least on the tag value, so data values of tagged data items specifying different tag values for the given input channel are allocatable to different buffer structures.
摘要:
A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment. Predicate generation circuitry is responsive to a compute descriptor instruction specifying an input vector operand comprising a plurality of segment descriptors, to generate per lane predicate information used by the vector processing circuitry when performing the segmented operation to maintain a boundary between each of the plurality of segments. As a result, interaction between lanes containing data elements from different segments is prevented. This allows very effective utilisation of the lanes of parallel processing within the vector processing circuitry to be achieved.
摘要:
There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.