Apparatus and method for speculatively vectorising program code

    公开(公告)号:US12131155B2

    公开(公告)日:2024-10-29

    申请号:US17597134

    申请日:2020-03-25

    摘要: An apparatus and method are provided for speculatively vectorising program code. The apparatus includes processing circuitry for executing program code, the program code including an identified code region comprising at least a plurality of speculative vector memory access instructions. Execution of each speculative vector memory access instruction is employed to perform speculative vectorisation of a series of scalar memory access operations using a plurality of lanes of processing. Tracking storage is used to maintain, for each speculative vector memory access instruction, tracking information providing an indication of a memory address being accessed within each lane. Checking circuitry then references the tracking information during execution of the identified code region by the processing circuitry, in order to detect any inter lane memory hazard resulting from the execution of the plurality of speculative vector memory access instructions.

    Handling exceptional conditions for vector arithmetic instruction

    公开(公告)号:US10776124B2

    公开(公告)日:2020-09-15

    申请号:US15769558

    申请日:2016-09-14

    申请人: ARM LIMITED

    IPC分类号: G06F9/38 G06F9/30

    摘要: Processing circuitry supports a first type of vector arithmetic instruction specifying at least a first input vector. When at least one exceptional condition is detected for an arithmetic operation performed for a first active data element of the first input vector in a predetermined sequence, the processing circuitry performs at least one response action. When the at least one exceptional condition is detected for a given active data element other than the first active data element in the predetermined sequence, the processing circuitry suppresses the at least one response action and stores elements identifying information identifying which data element is the given active data element which triggered the exceptional condition. This can be useful for reducing the amount of hardware resource for tracking the occurrence of the exceptional conditions and/or supporting speculative execution of vector instructions.

    Data processing apparatus and method for performing scan operations
    4.
    发明授权
    Data processing apparatus and method for performing scan operations 有权
    用于执行扫描操作的数据处理装置和方法

    公开(公告)号:US09355061B2

    公开(公告)日:2016-05-31

    申请号:US14165967

    申请日:2014-01-28

    申请人: ARM LIMITED

    摘要: A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.

    摘要翻译: 提供了一种用于执行向量扫描指令的数据处理装置和方法。 数据处理装置包括被配置为存储向量操作数的向量寄存器存储器,以及被配置为对从所述向量寄存器存储器检索的向量操作数执行操作的处理电路。 此外,控制电路被配置为控制处理电路执行一个或多个指令所需的操作,所述一个或多个指令包括指定包括N个向量元素的向量操作数的向量扫描指令,并且定义要在 向量操作数中向量元素的序列。 控制电路响应于矢量扫描指令将指定矢量操作数的N个向量元素划分为相邻矢量元素的P组,其中P在2和N / 2之间,并且控制处理电路执行分区扫描 操作产生与定义的扫描操作相同的结果。 处理电路被配置为通过对包含在每个组中的序列的那些矢量元素执行单独的扫描操作来执行分割扫描操作,以产生每个组的中间结果,并且执行计算操作以将中间结果组合成最终结果 向量操作数包含一系列结果向量元素。 本发明的划分扫描操作方法能够在能量消耗和性能之间实现平衡。

    CIRCUITRY AND METHOD FOR INSTRUCTION EXECUTION IN DEPENDENCE UPON TRIGGER CONDITIONS

    公开(公告)号:US20240220269A1

    公开(公告)日:2024-07-04

    申请号:US18261966

    申请日:2022-01-19

    申请人: Arm Limited

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3853

    摘要: Circuitry comprises processing circuitry configured to execute program instructions in dependence upon respective trigger conditions matching a current trigger state and to set a next trigger state in response to program instruction execution; the processing circuitry comprising: instruction storage configured to selectively provide a group of two or more program instructions for execution in parallel; and trigger circuitry responsive to the generation of a trigger state by execution of program instructions and to a trigger condition associated with a given group of program instructions, to control the instruction storage to provide program instructions of the given group of program instructions for execution.

    Data processing apparatus and method for processing vector operands

    公开(公告)号:US10514919B2

    公开(公告)日:2019-12-24

    申请号:US14601598

    申请日:2015-01-21

    申请人: ARM Limited

    IPC分类号: G06F9/30

    摘要: A data processing apparatus has processing circuitry for processing vector operands from a vector register store in response to vector micro-operations, some of which have control information identifying which data elements of the vector operands are selected for processing. Control circuitry detects vector micro-operations for which the control information specifies that a portion of the vector operand to be processed has no selected elements. If this is the case, then the control circuitry controls the processing circuitry to process a lower latency replacement micro-operation instead of the original micro-operation. This provides better performance than if a branch instruction is used to bypass the micro-operation if there are no selected elements.

    A DATA PROCESSING APPARATUS AND METHOD FOR HANDLING STALLED DATA

    公开(公告)号:US20240296132A1

    公开(公告)日:2024-09-05

    申请号:US18574277

    申请日:2022-06-21

    申请人: Arm Limited

    IPC分类号: G06F13/16 G06F13/26

    摘要: There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).

    Input channel processing for triggered-instruction processing element

    公开(公告)号:US12045622B2

    公开(公告)日:2024-07-23

    申请号:US17941404

    申请日:2022-09-09

    申请人: Arm Limited

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3856 G06F9/3802

    摘要: One or more triggered-instruction processing elements are provided, a given triggered-instruction processing element comprising execution circuitry to execute processing operations in response to instructions according to a triggered instruction architecture. Input channel processing circuitry receives a given tagged data item (comprising a data value and a tag value) for a given input channel, and in response controls enqueuing of the data value of the given tagged data item to a selected buffer structure selected from among at least two buffer structures mapped onto register storage accessible to one or more of the triggered-instruction processing elements in response to a computation instruction for controlling performance of a computation operation. The selected buffer structure is selected based at least on the tag value, so data values of tagged data items specifying different tag values for the given input channel are allocatable to different buffer structures.

    Data processing apparatus and method for performing segmented operations
    9.
    发明授权
    Data processing apparatus and method for performing segmented operations 有权
    用于执行分段操作的数据处理装置和方法

    公开(公告)号:US09557995B2

    公开(公告)日:2017-01-31

    申请号:US14175268

    申请日:2014-02-07

    申请人: ARM LIMITED

    IPC分类号: G06F9/00 G06F9/30 G06F9/38

    摘要: A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment. Predicate generation circuitry is responsive to a compute descriptor instruction specifying an input vector operand comprising a plurality of segment descriptors, to generate per lane predicate information used by the vector processing circuitry when performing the segmented operation to maintain a boundary between each of the plurality of segments. As a result, interaction between lanes containing data elements from different segments is prevented. This allows very effective utilisation of the lanes of parallel processing within the vector processing circuitry to be achieved.

    摘要翻译: 提供了一种用于执行分段操作的数据处理装置和方法。 数据处理装置包括用于存储向量操作数的向量寄存器存储器和提供N个并行处理通道的向量处理电路,并且被布置为对由指定向量操作数提供的多达N个数据元素执行分段操作,每个数据元素被分配 到N条车道之一。 最多N个数据元素形成多个段,并且分段操作的执行包括对每个段的数据元素执行单独的操作,该单独操作涉及包含相关段的数据元素的通道之间的交互。 谓词生成电路响应于指定包括多个段描述符的输入向量操作数的计算描述符指令,以在执行分割操作时生成由向量处理电路使用的每通道谓词信息,以维持多个段中的每个段之间的边界 。 结果,阻止了包含来自不同段的数据元素的通道之间的相互作用。 这允许在矢量处理电路内非常有效地利用并行处理的通道。

    Processing of issued instructions
    10.
    发明授权

    公开(公告)号:US11966739B2

    公开(公告)日:2024-04-23

    申请号:US17941387

    申请日:2022-09-09

    申请人: Arm Limited

    IPC分类号: G06F9/30 G06F9/38 G06F9/48

    摘要: There is provided an apparatus, method and medium for data processing. The apparatus comprises a register file comprising a plurality of data registers, and frontend circuitry responsive to an issued instruction, to control processing circuitry to perform a processing operation to process an input data item to generate an output data item. The processing circuitry is responsive to a first encoding of the issued instruction specifying a data register, to read the input data item from the data register, and/or write the output data item to the data register. The processing circuitry is responsive to a second encoding of the issued instruction specifying a buffer-region of the register file for storing a queue of data items, to perform the processing operation and to perform a dequeue operation to dequeue the input data item from the queue, and/or perform an enqueue operation to enqueue the output data item to the queue.