-
公开(公告)号:US11327752B2
公开(公告)日:2022-05-10
申请号:US16487256
申请日:2018-02-02
Applicant: ARM LIMITED
Inventor: Grigorios Magklis , Nigel John Stephens , Jacob Eapen , Mbou Eyole , David Hennah Mansell
Abstract: A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and an index. In response to the instruction control signals are generated, causing processing circuitry to perform a data processing operation with respect to each data group in the first source register and the second source register to generate respective result data groups forming a result of the data processing operation. Each of the first source register and the second source register has a size which is an integer multiple at least twice a predefined size of the data group, and each data group comprises a plurality of data elements. The operands of the data processing operation for each data group are a selected data element identified in the data group of the first source register by the index and each data element in the data group of the second source register. A technique for element-by-vector operation which is readily scalable as the register width grows.
-
12.
公开(公告)号:US11042378B2
公开(公告)日:2021-06-22
申请号:US15743735
申请日:2016-07-28
Applicant: ARM LIMITED
Inventor: Nigel John Stephens , Mbou Eyole , Alejandro Martinez Vicente
Abstract: Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inactive, all of the derived predicate flags in the set applicable to the current data vector are inactive.
-
公开(公告)号:US10776124B2
公开(公告)日:2020-09-15
申请号:US15769558
申请日:2016-09-14
Applicant: ARM LIMITED
Inventor: Giacomo Gabrielli , Nigel John Stephens
Abstract: Processing circuitry supports a first type of vector arithmetic instruction specifying at least a first input vector. When at least one exceptional condition is detected for an arithmetic operation performed for a first active data element of the first input vector in a predetermined sequence, the processing circuitry performs at least one response action. When the at least one exceptional condition is detected for a given active data element other than the first active data element in the predetermined sequence, the processing circuitry suppresses the at least one response action and stores elements identifying information identifying which data element is the given active data element which triggered the exceptional condition. This can be useful for reducing the amount of hardware resource for tracking the occurrence of the exceptional conditions and/or supporting speculative execution of vector instructions.
-
公开(公告)号:US09875214B2
公开(公告)日:2018-01-23
申请号:US14814590
申请日:2015-07-31
Applicant: ARM LIMITED , APPLE, INC.
Inventor: Mbou Eyole , Nigel John Stephens , Jeffry Gonion , Alex Klaiber , Charles Tucker
CPC classification number: G06F15/8076 , G06F9/30032 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30072 , G06F9/30101 , G06F9/30109 , G06F9/30192 , G06F9/345 , G06F9/3455 , G06F9/355 , G06F9/3887
Abstract: An apparatus and method are provided for transferring a plurality of data structures between memory and a plurality of vector registers, each vector register being arranged to store a vector operand comprising a plurality of data elements. Access circuitry is used to perform access operations to move data elements of vector operands between the data structures in memory and specified vector registers, each data structure comprising multiple data elements stored at contiguous addresses in the memory. Decode circuitry is responsive to a single access instruction identifying a plurality of vector registers and a plurality of data structures that are located discontiguously with respect to each other in the memory, to generate control signals to control the access circuitry to perform a sequence of access operations to move the plurality of data structures between the memory and the plurality of vector registers such that the vector operand in each vector register holds a corresponding data element from each of the plurality of data structures. This provides a very efficient mechanism for performing complex access operations, resulting in an increase in execution speed, and potential reductions in power consumption.
-
公开(公告)号:US09619225B2
公开(公告)日:2017-04-11
申请号:US14878188
申请日:2015-10-08
Applicant: ARM Limited
Inventor: David James Seal , Richard Roy Grisenthwaite , Nigel John Stephens
CPC classification number: G06F9/3016 , G06F7/764 , G06F7/768 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/30145 , G06F9/3887
Abstract: A data processing apparatus comprises a processing circuit and instruction decoder. A bitfield manipulation instruction controls the processing apparatus to generate at least one result data element from corresponding first and second source data elements. Each result data element includes a portion corresponding to a bitfield of the corresponding first source data element. Bits of the result data element that are more significant than the inserted bitfield have a prefix value that is selected, based on a control value specified by the instruction, as one of a first prefix value having a zero value, a second prefix value having the value of a portion of the corresponding second source data element, and a third prefix value corresponding to a sign extension of the bitfield of the first source data element. Bitwise logical instructions are also described.
-
公开(公告)号:US09495163B2
公开(公告)日:2016-11-15
申请号:US14573193
申请日:2014-12-17
Applicant: ARM Limited
Inventor: Nigel John Stephens , David James Seal
CPC classification number: G06F9/3557 , G06F9/30007 , G06F9/30112 , G06F9/30167 , G06F9/342 , G06F9/345 , G06F2212/657
Abstract: A data processing apparatus is provided comprising processing circuitry and an instruction decoder responsive to program instructions to control processing circuitry to perform the data processing. The instruction decoder is responsive to an address calculating instruction to perform an address calculating operation for calculating a partial address result from a non-fixed reference address and a partial offset value such that a full address specifying a memory location of an information entity is calculable from said partial address result using at least one supplementary program instruction. The partial offset value has a bit-width greater than or equal to said instruction size and is encoded within at least one partial offset field of said address calculating instruction. A corresponding data processing method, virtual machine and computer program product are also provided.
-
公开(公告)号:US11422807B2
公开(公告)日:2022-08-23
申请号:US16629178
申请日:2018-06-27
Applicant: ARM LIMITED
Inventor: Grigorios Magklis , Nigel John Stephens
IPC: G06F9/30
Abstract: An apparatus and method of operating an apparatus are provided. The apparatus is responsive to a bit-testing instruction which specifies a source vector register and an index to perform a bit-testing procedure on plural elements stored in the source vector register to generate plural result bits. The bit-testing procedure comprises, for each processed element of the plural elements, setting a respective result bit of the plural result bits in dependence on a value of a tested bit at a bit position in the processed element of the source vector register indicated by the index. This bit-testing instruction thus enables increased performance of program code which is required to perform multiple bit tests and can be suitably formulated into a vectorised form.
-
公开(公告)号:US11093243B2
公开(公告)日:2021-08-17
申请号:US16630622
申请日:2018-07-02
Applicant: ARM LIMITED
Inventor: Mbou Eyole , Nigel John Stephens
Abstract: Vector interleaving techniques in a data processing apparatus are disclosed, comprising apparatuses, instructions, methods of operating the apparatuses, and simulator implementations. A vector interleaving instruction specifies a first source register, second source register, and destination register. A first set of input data items is retrieved from the first source register and a second set of input data items from the second source register. A data processing operation is performed on selected input data item pairs taken from the first and second set of input data items to generate a set of result data items, which are stored as a result data vector in the destination register. First source register dependent result data items are stored in a first set of alternating positions in the destination data vector and second source register dependent result data items are stored in a second set of alternating positions in the destination data vector.
-
公开(公告)号:US11003450B2
公开(公告)日:2021-05-11
申请号:US15759900
申请日:2016-09-14
Applicant: ARM LIMITED
Inventor: Nigel John Stephens
Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.
-
公开(公告)号:US11003447B2
公开(公告)日:2021-05-11
申请号:US15743745
申请日:2016-06-23
Applicant: ARM LIMITED
Inventor: Nigel John Stephens
Abstract: A data processing system (2) supports vector processing operations performed upon vector operands comprising a plurality of vector operand elements. The data processing system includes a processor (4) having an instruction decoder (14) which decodes mixed-element-sized vector arithmetic instructions to generate control signals (16) which control processing circuitry (18) to perform arithmetic operations upon a first vector of first source operand elements ai of a first bit size A, and a second vector of second source operand elements bj of a second bit size B. The second bit size B is greater than the first bit size A.
-
-
-
-
-
-
-
-
-