Abstract:
A data processing apparatus and a method for processing data are disclosed. The data processing apparatus comprises: multithreaded processing circuitry to perform processing operations of a plurality of micro-threads, each micro-thread operating in a corresponding execution context defining an architectural state. Thread control circuitry collects runtime data indicative of a performance metric relating to the processing operations. Decoder circuitry is responsive to a detach instruction in a first micro-thread of instructions executed in a first execution context defining a first architectural state, the detach instruction specifying an address, to provide detach control signals to the thread control circuitry. When the runtime data meet a parallelisation criterion, the thread control circuitry is responsive to the detach control signals to spawn a second micro-thread of instructions executed in a second execution context defining a second architectural state based on the first architectural state, the second micro-thread of instructions comprising a subset of instructions of the first micro-thread of instructions starting at the address.
Abstract:
A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment. Predicate generation circuitry is responsive to a compute descriptor instruction specifying an input vector operand comprising a plurality of segment descriptors, to generate per lane predicate information used by the vector processing circuitry when performing the segmented operation to maintain a boundary between each of the plurality of segments. As a result, interaction between lanes containing data elements from different segments is prevented. This allows very effective utilisation of the lanes of parallel processing within the vector processing circuitry to be achieved.
Abstract:
There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.
Abstract:
An apparatus and method are provided for monitoring events in a data processing system. The apparatus has first event monitoring circuitry for monitoring occurrences of a first event within a data processing system, and for asserting a first signal to indicate every m-th occurrence of the first event, where m is an integer of 1 or more. In addition second event monitoring circuitry is used to monitor occurrences of a second event within the data processing system, and to assert a second signal to indicate every n-th occurrence of the second event, where n is an integer of 1 or more. History maintenance circuitry then maintains event history information which is updated in dependence on the asserted first and second signals. Further, history analysis circuitry is responsive to an analysis trigger to analyse the event history information in order to detect a reporting condition when the event history information indicates that a ratio between occurrences of the first event and the occurrences of the second event is outside an acceptable range. The history analysis circuitry is then responsive to detection of the reporting condition to assert a report signal. This provides a particularly efficient and effective mechanism for monitoring ratios of events within a data processing system.
Abstract:
A vector scan operation is performed to generate M data elements of a result vector, where each result data element corresponds to a combination of an additional data element with at least some of the data elements of a source vector operand V. The vector scan operation is performed using a plurality of steps, each step comprising one or more combination operations for combining data elements. At least one of the steps includes two or more combination operations performed in parallel. At least two of the steps comprise a combination operation for combining a data element with the additional data element S. This approach enables the vector scan operation to be performed in fewer steps in the case where fewer than M data elements are active, so that the vector scan operation can be performed more quickly.
Abstract:
One or more triggered-instruction processing elements are provided, a given triggered-instruction processing element comprising execution circuitry to execute processing operations in response to instructions according to a triggered instruction architecture. Input channel processing circuitry receives a given tagged data item (comprising a data value and a tag value) for a given input channel, and in response controls enqueuing of the data value of the given tagged data item to a selected buffer structure selected from among at least two buffer structures mapped onto register storage accessible to one or more of the triggered-instruction processing elements in response to a computation instruction for controlling performance of a computation operation. The selected buffer structure is selected based at least on the tag value, so data values of tagged data items specifying different tag values for the given input channel are allocatable to different buffer structures.
Abstract:
An apparatus comprises processing circuitry to perform data processing and instruction decoding circuitry to decode instructions to control the processing circuitry to perform the data processing. The instruction decoding circuitry is responsive to a speculation barrier instruction to control the processing circuitry to prevent a subsequent operation, appearing in program order after the speculation barrier instruction, that has an address dependency on an earlier instruction preceding the speculation barrier instruction in the program order, from speculatively influencing allocations of entries in a cache. This provides protection against speculative cache-timing side-channel attacks.
Abstract:
Processing circuitry supports a first type of vector arithmetic instruction specifying at least a first input vector. When at least one exceptional condition is detected for an arithmetic operation performed for a first active data element of the first input vector in a predetermined sequence, the processing circuitry performs at least one response action. When the at least one exceptional condition is detected for a given active data element other than the first active data element in the predetermined sequence, the processing circuitry suppresses the at least one response action and stores elements identifying information identifying which data element is the given active data element which triggered the exceptional condition. This can be useful for reducing the amount of hardware resource for tracking the occurrence of the exceptional conditions and/or supporting speculative execution of vector instructions.
Abstract:
A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.
Abstract:
Provided is a data stream processor comprising a streamed data transceiver interface, a structure of processing units configurable to transform data received from a data source over the streamed data transceiver interface according to a specified output requirement, and a configuration unit operable in electronic communication with a data consumer to receive an output requirement and to configure the operation and linkage of a processing unit in the structure of processing units to transform input data to output data according to the specified output requirement; wherein the structure of processing units is further operable to provide the output data for output over the streamed data transceiver interface.