Predicate indicator generation for vector processing operations

    公开(公告)号:US11036503B2

    公开(公告)日:2021-06-15

    申请号:US15236769

    申请日:2016-08-15

    Applicant: ARM LIMITED

    Abstract: Processing circuitry selectively applies vector processing operations to one or more data items of one or more data vectors. Each data vector comprises a plurality of data items at respective vector positions in the data vector according to the state of respective predicate indicators associated with the vector positions. Predicate generation circuitry apply a processing operation to generate a set of predicate indicators, each associated with a respective one of the vector positions, to generate a count value indicative of the number of predicate indicators in the set having a given state, and to store the generated set of predicate indicators and the count value in a predicate store.

    Apparatus and method for performing multiply-and-accumulate-products operations

    公开(公告)号:US10409604B2

    公开(公告)日:2019-09-10

    申请号:US15859931

    申请日:2018-01-02

    Applicant: Arm Limited

    Abstract: An apparatus and method are provided for performing multiply-and-accumulate-products (MAP) operations. The apparatus has processing circuitry for performing data processing, the processing circuitry including an adder array having a plurality of adders for accumulating partial products produced from input operands. An instruction decoder is provided that is responsive to a MAP instruction specifying a first J-bit operand and a second K-bit operand, to control the processing circuitry to enable performance of a number of MAP operations, where the number is dependent on a parameter. For each performed MAP operation, the processing circuitry is arranged to generate a corresponding result element representing a sum of respective E×F products of E-bit portions within an X-bit segment of the first operand with F-bit portions within a Y-bit segment of the second operand, where E

    Apparatus and method for performing arithmetic operations to accumulate floating-point numbers

    公开(公告)号:US10216479B2

    公开(公告)日:2019-02-26

    申请号:US15370660

    申请日:2016-12-06

    Applicant: ARM Limited

    Abstract: An apparatus and method are provided for performing arithmetic operations to accumulate floating-point numbers. The apparatus comprises execution circuitry to perform arithmetic operations, and decoder circuitry to decode a sequence of instructions. A convert and accumulate instruction is provided, and the decoder circuitry is responsive to decoding the convert and accumulate instruction to generate one or more control signals to control the execution circuitry to convert at least one floating-point operand identified by the convert and accumulate instruction into a corresponding N-bit fixed-point operand having M fraction bits, where M is less than N and M is dependent on a format of the floating-point operand. The execution circuitry accumulates each corresponding N bit fixed-point operand and a P bit fixed-point operand identified by the convert and accumulate instruction in order to generate a P bit fixed-point result value, where P is greater than N and also has M fraction bits.

    Lane position information for processing of vector

    公开(公告)号:US09733899B2

    公开(公告)日:2017-08-15

    申请号:US14939371

    申请日:2015-11-12

    Applicant: ARM LIMITED

    Abstract: Processing circuitry performs a plurality of lanes of processing on respective data elements of at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry identifies lane position information for each lane of processing, the lane position information for a given lane identifying a relative position of the corresponding result data element to be generated by the given lane within a corresponding result data value spanning one or more result data elements of the result vector. The processing circuitry is configured to perform each lane of processing in dependence on the lane position information identified for that lane. This enables generation of results which are wider or narrower than the vector size supported in hardware.

    Apparatus and method for performing floating-point square root operation

    公开(公告)号:US09710229B2

    公开(公告)日:2017-07-18

    申请号:US14728085

    申请日:2015-06-02

    Applicant: ARM LIMITED

    CPC classification number: G06F7/5525 G06F7/483

    Abstract: A data processing apparatus has a processing circuitry for performing a floating-point square root operation on a radicand value R to generate a result value. The processing circuitry has first square root processing circuitry for processing radicand values R which are not an exact power of two and second square root processing circuitry for processing radicand values which are an exact power of 2. Power-of-two detection circuitry detects whether the radicand value is an exact power of two and selects the output of the first or second square root processing circuitry as appropriate. This allows the result to be generated in fewer processing cycles when the radicand is a power of 2.

    Multiplication of first and second operands using redundant representation

    公开(公告)号:US09703531B2

    公开(公告)日:2017-07-11

    申请号:US14939469

    申请日:2015-11-12

    Applicant: ARM LIMITED

    CPC classification number: G06F7/5443 G06F7/50 G06F7/5324 G06F7/5336

    Abstract: A method is provided for multiplying a first operand comprising at least two X-bit portions and a second operand comprising at least one Y-bit portion. At least two partial products are generated, each partial product comprising a product of a selected X-bit portion of the first operand and a selected Y-bit portion of the second operand. Each partial product is converted to a redundant representation in dependence on significance indicating information indicative of a significance of the partial product. In the redundant representation, the partial product is represented using a number of N-bit portions, and in a group of at least two adjacent N-bit portions, a number of overlap bits of a lower N-bit portion of the group have a same significance as some least significant bits of at least one upper N-bit portion of the group. The partial products are added while represented in the redundant representation.

    Predicting saturation in a shift operation
    28.
    发明授权
    Predicting saturation in a shift operation 有权
    预测换档操作中的饱和度

    公开(公告)号:US09208839B2

    公开(公告)日:2015-12-08

    申请号:US14220490

    申请日:2014-03-20

    Applicant: ARM LIMITED

    Abstract: Apparatus for data processing and a method of data processing are provided. Shift circuitry performs a shift operation in response to a shift instruction, shifting bits of an input data value in a direction specified by the shift instruction. Bit location indicator generation circuitry and comparison circuitry operate in parallel with the shift circuitry. The bit location indicator indicates at least one bit location in the input data value which must not have a bit set if the shifted data value is not to saturate. Comparison circuitry compares the bit location indicator with the input data value and indicates a saturation condition if any bits are indicated by the bit position indicator for bit locations which hold set bits in the input data value. A faster indication of the saturation condition thus results.

    Abstract translation: 提供了数据处理装置和数据处理方法。 移位电路响应于移位指令执行移位操作,在由移位指令指定的方向上移位输入数据值的位。 位位置指示符生成电路和比较电路与移位电路并行操作。 位位置指示符指示输入数据值中的至少一个位位置,如果移位的数据值不饱和,则该位置不能有位置位。 比较电路将位位置指示符与输入数据值进行比较,并且如果位位置指示符指示了用于保持输入数据值中的设置位的位位置的任何位,则表示饱和条件。 因此,更快地显示饱和条件。

    Data processing apparatus and method for generating a status flag using predicate indicators

    公开(公告)号:US11080054B2

    公开(公告)日:2021-08-03

    申请号:US15236728

    申请日:2016-08-15

    Applicant: ARM LIMITED

    Abstract: Data processing apparatus comprises processing circuitry to selectively apply vector processing operations to one or more data items of one or more data vectors each comprising an ordered plurality of data items at respective vector positions in the data vector, according to the state of respective predicate indicators associated with the vector positions; predicate generation circuitry to apply a processing operation to generate an ordered set of predicate indicators, each associated with a respective one of the vector positions, the ordered set of predicate indicators being associated with an ordered set of active indicators each having an active or an inactive state; and a detector to detect a status flag indicative of whether a predicate indicator at a position, in the ordered set of predicate indicators, corresponding to the position of an outermost active indicator having the active state, has a given state; in which the detector comprises: first and second circuitry to combine the ordered set of predicate indicators and the ordered set of active indicators using first and second respective logical bit-wise combinations to generate first and second ordered sets of intermediate data; and arithmetic circuitry to combine the first and second ordered sets of intermediate data using an arithmetic combination generating a carry bit, the detector generating the status flag in dependence upon the carry bit.

Patent Agency Ranking