-
公开(公告)号:US11216277B2
公开(公告)日:2022-01-04
申请号:US16583729
申请日:2019-09-26
Applicant: Arm Limited
Inventor: Chiloda Ashan Senarath Pathirane
Abstract: Aspects of the present disclosure relate to an apparatus comprising register circuitry implementing a plurality of registers and processing circuitry to perform data processing operations on data stored in said registers. The apparatus comprises store buffer circuitry to, responsive to a store instruction in respect of given data, temporarily store said given data prior to providing said given data to a memory. Responsive to receiving at the processing circuitry a request to perform a state-saving-triggering operation, the register circuitry is configured to capture in shadow registers of said register circuitry a state of a subset of registers of the plurality of registers, provide the captured state from the shadow registers to the memory.
-
公开(公告)号:US11429393B2
公开(公告)日:2022-08-30
申请号:US14938285
申请日:2015-11-11
Applicant: ARM LIMITED
IPC: G06F9/38
Abstract: An apparatus for data processing and a method of data processing are provided. Data processing operations are performed in response to instructions which reference architectural registers using physical registers to store data values when performing the data processing operations. Mappings between the architectural registers and the physical registers are stored, and when a data hazard condition is identified with respect to out-of-order program execution of an instruction, an architectural register specified in the instruction is remapped to an available physical register. A reorder buffer stores an entry for each destination architectural register specified by the instruction, entries being stored in program order, and an entry specifies a destination architectural register and an original physical register to which the destination architectural register was mapped before the architectural register remapped to an available physical register.
-
公开(公告)号:US11036510B2
公开(公告)日:2021-06-15
申请号:US16157400
申请日:2018-10-11
Applicant: Arm Limited
Abstract: A merging predicated instruction controls a processing pipeline to perform a processing operation to determine a processing result based on at least one source operand, and to perform a merging operation to merge the processing result with a previous value of a destination register under control of a predicate value identifying, for each of a plurality of portions of the destination register, whether that portion is to be set to a corresponding portion of the processing result or a corresponding portion of the previous value. The merging predicated instruction is permitted to be issued to the pipeline with a timing which results in the previous value of the destination register still being unavailable when the merging predicated instruction is at a given pipeline stage at which the processing result is determined. This can help to improve performance of subsequent instructions which are independent of the merging predicated instruction.
-
公开(公告)号:US10963253B2
公开(公告)日:2021-03-30
申请号:US16030963
申请日:2018-07-10
Applicant: Arm Limited
Inventor: Karel Hubertus Gerardus Walters , Chiloda Ashan Senarath Pathirane , Michael Alexander Kennedy
Abstract: An apparatus comprises instruction decoding circuitry to generate micro-operations in response to program instructions; and processing circuitry to perform data processing in response to the micro-operations generated by the instruction decoding circuitry. In response to a predicated vector instruction, the instruction decoding circuitry reads or predicts an estimated value of the predicate value, and depending on the estimated value, varies a composition of at least one micro-operation generated in response to the predicated vector instruction. This can enable more efficient use of hardware resources in the processing circuitry.
-
公开(公告)号:US10296349B2
公开(公告)日:2019-05-21
申请号:US14989841
申请日:2016-01-07
Applicant: ARM LIMITED
Inventor: Vladimir Vasekin , Antony John Penton , Chiloda Ashan Senarath Pathirane , Andrew James Antony Lees
IPC: G06F9/38
Abstract: Data processing circuitry comprises allocation circuitry to allocate one or more source and destination processor registers, of a set of processor registers each defined by a respective register index, to a processor instruction for use in execution of that processor instruction and to associate, with the processor instruction, information to indicate the register index of the allocated source and destination processor registers; the avocation circuitry being selectively operable to allocate, to a processor instruction, a group of destination processor registers having a subset of their register indices in common and to associate, with the processor instruction, information to indicate the register index of one processor register of the group and identifying information to identify one or more bits of the register index which differ between the processor registers in the allocated group of processor registers.
-
公开(公告)号:US11416252B2
公开(公告)日:2022-08-16
申请号:US15855139
申请日:2017-12-27
Applicant: Arm Limited
Inventor: Vladimir Vasekin , Chiloda Ashan Senarath Pathirane , Jungsoo Kim , Alexei Fedorov
Abstract: A data processing system includes an instruction pipeline containing instruction queue circuitry, fusion circuitry and decoder circuitry. The fusion circuitry serves to identify fusible groups of program instructions within a Y-wide window of program instructions and supply a stream of program instructions including such replacement fused program instructions to a X-wide decoder circuitry which decodes X program instructions in parallel using parallel decoders.
-
公开(公告)号:US11068238B2
公开(公告)日:2021-07-20
申请号:US16417866
申请日:2019-05-21
Applicant: Arm Limited
Abstract: A multiplier circuit is described in which sub-products calculated in a first stage of a carry-save adder (CSA) network are output early, processed by applying a processing function, and re-injected into a subsequent stage of the CSA network to add the processed sub-products. This allows a CSA network provided for multiplication operations to be reused for operations which require sub-products to be processed and added, such as floating-point dot product operations performed on floating-point values represented in bfloatl6 format.
-
公开(公告)号:US10719329B2
公开(公告)日:2020-07-21
申请号:US16021178
申请日:2018-06-28
Applicant: Arm Limited
Abstract: An apparatus and method are provided for using predicted result values. The apparatus has a processing unit that comprises processing circuitry for executing a sequence of instructions, and value prediction circuitry for identifying a predicted result value for at least one instruction. A result producing structure is provided that is responsive to a request issued from the processing unit when the processing circuitry is executing a first instruction, to produce a result value for the first instruction and return that result value to the processing unit. While waiting for the result value from the result producing structure, the processing circuitry can be arranged to speculatively execute at least one dependent instruction using a predicted result value for the first instruction as obtained from the value prediction circuitry. The request issued from the processing unit includes a signature value indicative of the predicted result value, and the result producing structure references the signature value in order to detect whether a mispredict condition exists indicating that the predicted result value differs from the result value. The apparatus further provides a mispredict signal transmission path via which the result producing structure, when the mispredict condition is detected, can assert a mispredict signal for receipt by the processing unit prior to the result value being available to the processing unit. Such an approach can reduce the misprediction penalty associated with using a mispredicted result value.
-
公开(公告)号:US10552160B2
公开(公告)日:2020-02-04
申请号:US15987113
申请日:2018-05-23
Applicant: ARM Limited
IPC: G06F9/38
Abstract: A processing pipeline for processing instructions with instructions from multiple threads in flight concurrently may have control circuitry to detect a stalling event associated with a given thread. In response, at least one instruction of the given thread may be flushed from the pipeline, and the control circuitry may trigger fetch circuitry to reduce a fraction of the fetched instructions which are fetched from the given thread. A mechanism is also described to determine when to trigger a predetermined action when a delay in accessing information becomes greater than a delay threshold, and to update the delay threshold based on a difference between a return delay when the information is returned from the storage circuitry and the delay threshold.
-
-
-
-
-
-
-
-