Processing of plural-register-load instruction

    公开(公告)号:US11314509B2

    公开(公告)日:2022-04-26

    申请号:US16823695

    申请日:2020-03-19

    Applicant: Arm Limited

    Inventor: Abhishek Raja

    Abstract: An apparatus comprises processing circuitry to issue load operations to load data from memory. In response to a plural-register-load instruction specifying at least two destination registers to be loaded with data from respective target addresses, the processing circuitry permits issuing of separate load operations corresponding to the plural-register-load instruction. Load tracking circuitry maintains tracking information for one or more issued load operations. When the plural-register-load instruction is subject to an atomicity requirement and the plurality of load operations are issued separately, the load tracking circuitry detects, based on the tracking information, whether a loss-of-atomicity condition has occurred for the load operations corresponding to the plural-register-load instruction, and requests re-processing of the plural-register-load instruction when the loss-of-atomicity condition is detected.

    Circuitry and method
    3.
    发明授权

    公开(公告)号:US11989583B2

    公开(公告)日:2024-05-21

    申请号:US17218425

    申请日:2021-03-31

    Applicant: Arm Limited

    Abstract: Circuitry comprises two or more clusters of execution units, each cluster comprising one or more execution units to execute processing instructions; and scheduler circuitry to maintain one or more queues of processing instructions, the scheduler circuitry comprising picker circuitry to select a queued processing instruction for issue to an execution unit of one of the clusters of execution units for execution; in which: the scheduler circuitry is configured to maintain dependency data associated with each queued processing instruction, the dependency data for a queued processing instruction indicating any source operands which are required to be available for use in execution of that queued processing instruction and to inhibit issue of that queued processing instruction until all of the required source operands for that queued processing instruction are available and is configured to be responsive to an indication to the scheduler circuitry of the availability of the given operand as a source operand for use in execution of queued processing instructions; and the scheduler circuitry is responsive to an indication of availability of one or more last awaited source operands for a given queued processing instruction, to inhibit issue by the scheduler circuitry of the given queued processing instruction to an execution unit in a cluster of execution units other than a cluster of execution units containing an execution unit which generated at least one of those last awaited source operands.

    Faulting address prediction for prefetch target address

    公开(公告)号:US11782845B2

    公开(公告)日:2023-10-10

    申请号:US17541007

    申请日:2021-12-02

    Applicant: Arm Limited

    CPC classification number: G06F12/1027

    Abstract: An apparatus comprises memory management circuitry to perform a translation table walk for a target address of a memory access request and to signal a fault in response to the translation table walk identifying a fault condition for the target address, prefetch circuitry to generate a prefetch request to request prefetching of information associated with a prefetch target address to a cache; and faulting address prediction circuitry to predict whether the memory management circuitry would identify the fault condition for the prefetch target address if the translation table walk was performed by the memory management circuitry for the prefetch target address. In response to a prediction that the fault condition would be identified for the prefetch target address, the prefetch circuitry suppresses the prefetch request and the memory management circuitry prevents the translation table walk being performed for the prefetch target address of the prefetch request.

    Granule protection information compression

    公开(公告)号:US11461247B1

    公开(公告)日:2022-10-04

    申请号:US17378866

    申请日:2021-07-19

    Applicant: Arm Limited

    Abstract: Address translation circuitry translates a target virtual address specified by a memory access request into a target physical address associated with a selected physical address space. Granule protection information (GPI) loading circuitry loads from a memory system at least one granule protection descriptor providing GPI indicating, for at least one granule of physical addresses, which physical address spaces is allowed access to the at least one granule. GPI compressing circuitry compresses the GPI to generate compressed GPI. A GPI cache to caches the compressed GPI. Filtering circuitry determines, on a hit in the GPI cache, whether the memory access request should be allowed to access the target physical address, based on whether the compressed GPI cached in the GPI cache for the target physical address indicates that the selected physical address space is allowed access to the target physical address. This allows more efficient caching of granule protection information.

    Apparatus and method for accessing an address translation cache

    公开(公告)号:US10545877B2

    公开(公告)日:2020-01-28

    申请号:US15945900

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: An apparatus and method are provided for accessing an address translation cache. The address translation cache has a plurality of entries, where each entry is used to store address translation data used when converting a virtual address into a corresponding physical address of a memory system. The virtual address is generated from a plurality of source values. Allocation circuitry is responsive to received address translation data, to allocate an entry within the address translation cache to store the received address translation data. A hash value indication is associated with the allocated entry, where the hash value indication is computed from the plurality of source values used to generate a virtual address associated with the received address translation data. Lookup circuitry is responsive to an access request associated with a target virtual address, to perform a lookup process employing a target hash value computed from the plurality of source values used to generate the target virtual address, in order to identify any candidate matching entry in the address translation cache. When there is at least one candidate matching entry, a virtual address check process is then performed in order to determine whether any candidate matching entry is an actual matching entry whose address translation data enables the target virtual address to be translated to a corresponding target physical address. Such an approach can significantly improve the performance of accesses to the address translation cache, and can also give rise to power consumption savings.

    Store buffer
    7.
    发明授权

    公开(公告)号:US12223202B2

    公开(公告)日:2025-02-11

    申请号:US17693817

    申请日:2022-03-14

    Applicant: Arm Limited

    Abstract: An apparatus comprises processing circuitry to issue store operations to store data to a data store and load operations to load data from the data store and a store buffer comprising entries to store entry information corresponding to store operations in advance of the store operations completing. Store buffer lookup circuitry is provided to lookup, in response to a load operation, whether the store buffer contains a corresponding entry corresponding to an older store operation for which target addresses of the load operation and the older store operation satisfy an address comparison condition. The store buffer lookup circuitry is configured to perform store-to-load forwarding in response to the load operation when the corresponding entry is a first type of store buffer entry satisfying a forwarding condition, and delay processing of the load operation when the corresponding entry is a second type of store buffer entry satisfying the forwarding condition.

    Controlling data allocation to storage circuitry

    公开(公告)号:US12182427B2

    公开(公告)日:2024-12-31

    申请号:US17966071

    申请日:2022-10-14

    Applicant: Arm Limited

    Abstract: An apparatus is provided for controlling the operating mode of control circuitry, such that the control circuitry may change between two operating modes. In an allocation mode, data that is loaded in response to an instruction is allocated into storage circuitry from an intermediate buffer, and the data is read from the storage circuitry. In a non-allocation mode, the data is not allocated to the storage circuitry, and is read directly from intermediate buffer. The control of the operating mode may be performed by mode control circuitry, and the mode may be changed in dependence on the type of instruction that calls the data, and whether the data may be used again in the near future, or whether it is expected to be used only once.

    Handling of single-copy-atomic load/store instruction with a memory access request shared by micro-operations

    公开(公告)号:US11748101B2

    公开(公告)日:2023-09-05

    申请号:US17374149

    申请日:2021-07-13

    Applicant: Arm Limited

    CPC classification number: G06F9/30043 G06F9/3017 G06F9/30145 G06F9/3836

    Abstract: In response to a single-copy-atomic load/store instruction for requesting an atomic transfer of a target block of data between the memory system and the registers, where the target block has a given size greater than a maximum data size supported for a single load/store micro-operation by a load/store data path, instruction decoding circuitry maps the single-copy-atomic load/store instruction to two or more mapped load/store micro-operations each for requesting transfer of a respective portion of the target block of data. In response to the mapped load/store micro-operations, load/store circuitry triggers issuing of a shared memory access request to the memory system to request the atomic transfer of the target block of data of said given size to or from the memory system, and triggers separate transfers of respective portions of the target block of data over the load/store data path.

    Predicated vector load micro-operation for performing a complete vector load when issued before a predicate operation is available and a predetermined condition is unsatisfied

    公开(公告)号:US11714644B2

    公开(公告)日:2023-08-01

    申请号:US17459130

    申请日:2021-08-27

    Applicant: Arm Limited

    Inventor: Abhishek Raja

    CPC classification number: G06F9/30043 G06F9/30036 G06F9/30145 G06F9/3836

    Abstract: A predicated vector load micro-operation specifies a load target address, a destination vector register for which active vector elements of the destination vector register are to be loaded with data associated with addresses identified based on the load target address, and a predicate operand indicative of whether each vector element of the destination vector register is active or inactive. A predetermined type of predicated vector load micro-operation can be issued to the processing circuitry before the predicate operand is determined to meet an availability condition, and if issued in this way memory access circuitry can determine, based on the load target address, whether the predetermined type of predicated vector load micro-operation satisfies a predetermined condition, and if the predetermined condition is unsatisfied, perform a complete vector load assuming all vector elements of the destination vector register are active vector elements, independent of whether the predicate operand when available identifies any inactive vector element of the destination vector register.

Patent Agency Ranking