-
公开(公告)号:US20200073660A1
公开(公告)日:2020-03-05
申请号:US16118528
申请日:2018-08-31
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Cedric Denis Robert AIRAUD , Luca NASSI , Damien Robin MARTIN
Abstract: Apparatus comprises counter and bit-shift circuitry to provide a succession of processing stages each comprising a count operation stage and a corresponding bit-shift stage, each processing stage operating with respect to a set of contiguous n-bit groups of bit positions, where n is 1 for a first processing stage and n doubles from one processing stage in the succession of processing stages to a next processing stage in the succession of processing stages; each count operation stage being configured to generate, for a first set of alternate instances of the n-bit groups of bit positions, count values indicating a respective number of bits of a predetermined bit value in a mask data word; and each bit-shift stage being configured to generate a bit-shifted data word by bit-shifting bits of a data word to be processed, for a second set of alternate instances of the n-bit groups of bit positions complementary to the first set, by respective numbers of bit positions dependent upon the count values generated by the respective count operation stage, in which the bit-shifted data word for one bit-shift stage in the succession of processing stages is used as the data word to be processed by the next bit-shift stage in the succession of processing stages.
-
公开(公告)号:US20210026600A1
公开(公告)日:2021-01-28
申请号:US16521740
申请日:2019-07-25
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , David Raymond LUTZ , Cédric Denis Robert AIRAUD
Abstract: An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane. The vector processing circuitry also has vector adder circuitry to perform, within each lane, an addition of at least the plurality of intermediate input values, in order to produce a result vector providing a result value for the index operation performed in each lane. This provides a high performance, low latency, technique for vectorising index operations.
-
公开(公告)号:US20210026772A1
公开(公告)日:2021-01-28
申请号:US16521665
申请日:2019-07-25
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Yohann Fred Arifidy RABEFARIHY , Cédric Denis Robert AIRAUD , Rémi Marius TEYSSIER
IPC: G06F12/0875 , G06F12/14 , G06F12/0895 , G06F9/30
Abstract: An apparatus is provided for determining, for use in a tag-guarded memory, a selected tag value from a plurality of tag values. The apparatus comprises ordered list generation circuitry to receive an excluded tag vector comprising a plurality of fields, where each field is associated with a tag value and identifies whether the associated tag value is excluded from use. The ordered list generation circuitry is arranged to generate, from the excluded tag vector, an ordered list of non-excluded tag values. The apparatus further comprises count determination circuitry to determine, using the excluded tag vector and an identified start tag value, a count value indicative of a number of non-excluded tag values occurring in a region of the excluded tag vector bounded by an initial field and a field corresponding to the start tag value. The apparatus also comprises tag selection circuitry to determine the selected tag value from the ordered list based on the count value and an identified offset which indicates a required number of non-excluded tag values between the start tag value and the selected tag value.
-
公开(公告)号:US20240086198A1
公开(公告)日:2024-03-14
申请号:US17943407
申请日:2022-09-13
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Zichao XIE
CPC classification number: G06F9/384 , G06F9/30123
Abstract: An apparatus has processing circuitry with execution units to perform operations, physical registers to store data, and forwarding circuitry to forward the data from the physical registers to the execution units. The forwarding circuitry provides an incomplete set of connections between the physical registers and the execution units such that, for each of at least some of the physical registers, the physical register is connected to only a subset of the execution units. The apparatus also has register renaming circuitry to map logical registers identified by the operations to respective physical registers and register reorganisation circuitry to monitor upcoming operations and to determine, based on the upcoming operations and the connections provided by the forwarding circuitry, whether to perform a register reorganisation procedure to change a mapping between the logical registers and the physical registers. The register reorganisation circuitry is also configured to perform the register reorganisation procedure.
-
公开(公告)号:US20240078035A1
公开(公告)日:2024-03-07
申请号:US17900975
申请日:2022-09-01
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Zichao XIE , Leonardo INTESA
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0673
Abstract: An apparatus has processing circuitry with one or more execution units to perform operations in response to instructions. The apparatus also has registers to store data accessed by the processing circuitry and forwarding circuitry to forward results of the operations from the execution units to be written back to the registers and to the execution units for use as operands of further operations. The apparatus also has write-back reschedule circuitry which for each operation causes an execution unit performing the operation to stall the operation prior to a write-back stage of the execution unit and determine, based on monitoring subsequent operations whether to forward the result of the operation to be written back to a register or to forward the result to an execution unit. The write-back reschedule circuitry also controls the forwarding circuitry to forward the result according to the determination.
-
公开(公告)号:US20190377706A1
公开(公告)日:2019-12-12
申请号:US16005790
申请日:2018-06-12
Applicant: Arm Limited
Inventor: Cedric Denis Robert AIRAUD , Luca NASSI , Damien Robin MARTIN , Xiaoyang SHEN
Abstract: Apparatuses and methods of data processing are disclosed. An apparatus comprises two data processing clusters each having multiple data processing lanes to perform single instruction multiple data (SIMD) processing. Decoded instructions are issued to at least one of the two data processing clusters. A decoded SIMD instruction specifying a vector length which is more than the width of the data processing lanes of the first data processing cluster has a first part issued to the first data processing cluster for execution. An issuance target for a second remaining part of the decoded SIMD instruction is selected in dependence on a dynamic performance condition. When the dynamic performance condition has a first state the issuance target is the first data processing cluster and when the dynamic performance condition has a second state the issuance target is the second data processing cluster. When the issuance target is the first data processing cluster, to schedule the first and second parts of the decoded SIMD instruction in series.
-
公开(公告)号:US20240289130A1
公开(公告)日:2024-08-29
申请号:US18174207
申请日:2023-02-24
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Zichao XIE , Cédric Denis Robert AIRAUD , Grégorie MARTIN
CPC classification number: G06F9/30098 , G06F9/3826 , G06F9/3869
Abstract: A data processing apparatus comprises operand routing circuitry configured to prepare operands for processing, and a plurality of processing elements. Each processing element comprises receiving circuitry, processing circuitry, and transmitting circuitry. A group of coupled processing elements comprises a first processing element configured to receive operands from the operand routing circuitry and one or more further processing elements for which the receiving circuitry is coupled to the transmitting circuitry of another processing element in the group. The apparatus also comprises timing circuitry, configured to selectively delay transmission of operands within the group of coupled processing elements to cause operations performed by the group of coupled processing elements to be staggered.
-
公开(公告)号:US20200065109A1
公开(公告)日:2020-02-27
申请号:US16524667
申请日:2019-07-29
Applicant: Arm Limited
Inventor: Xiaoyang SHEN , Damien Robin MARTIN , Cédric Denis Robert AIRAUD , Luca NASSI , François DONATI
Abstract: An apparatus has a processing pipeline, and first and second register files. A temporary-register-using instruction is supported which controls the pipeline to perform an operation using a temporary variable derived from an operand stored in the first register file. In response to the instruction, when a predetermined condition is not satisfied, the pipeline processes at least one register move micro-operation to transfer data from the at least one source register of the first register file to at least one newly allocated temporary register of the second register file. When the condition is satisfied, the operation can be performed using a temporary variable already stored in the temporary register of the second register file used by an earlier temporary-register-using instruction specifying the same source register for determining the temporary variable, in the absence of an intervening instruction for rewriting the source register.
-
-
-
-
-
-
-