Technique for efficient utilisation of an address translation cache

    公开(公告)号:US10552338B2

    公开(公告)日:2020-02-04

    申请号:US15437581

    申请日:2017-02-21

    Applicant: ARM Limited

    Abstract: An apparatus and method are provided for making efficient use of address translation cache resources. The apparatus has an address translation cache having a plurality of entries, where each entry is used to store address translation data used when converting a virtual address into a corresponding physical address of a memory system. Each item of address translation data has a page size indication for a page within the memory system that is associated with that address translation data. Allocation circuitry performs an allocation process to determine the address translation data to be stored in each entry. Further, mode control circuitry is used to switch a mode of operation of the apparatus between a non-skewed mode and at least one skewed mode, dependent on a page size analysis operation. The address translation cache is organised as a plurality of portions, and in the non-skewed mode the allocation circuitry is arranged, when performing the allocation process, to permit the address translation data to be allocated to any of the plurality of portions. In contrast, when in the at least one skewed mode, the allocation circuitry is arranged to reserve at least one portion for allocation of address translation data associated with pages of a first page size and at least one other portion for allocation of address translation data associated with pages of a second page size different to the first page size.

    Control of bulk memory instructions

    公开(公告)号:US12050805B2

    公开(公告)日:2024-07-30

    申请号:US17875758

    申请日:2022-07-28

    Applicant: Arm Limited

    Abstract: An apparatus supports decoding and execution of a bulk memory instruction specifying a block size parameter. The apparatus comprises control circuitry to determine whether the block size corresponding to the block size parameter exceeds a predetermined threshold, and performs a micro-architectural control action to influence the handling of at least one bulk memory operation by memory operation processing circuitry. The micro-architectural control action varies depending on whether the block size exceeds the predetermined threshold, and further depending on the states of other components and operations within or coupled with the apparatus. The micro-architectural control action could include an alignment correction action, cache allocation control action, or processing circuitry selection action.

    Allocation of store requests
    13.
    发明授权

    公开(公告)号:US11977738B2

    公开(公告)日:2024-05-07

    申请号:US17903293

    申请日:2022-09-06

    Applicant: Arm Limited

    CPC classification number: G06F3/0611 G06F3/0656 G06F3/0673

    Abstract: There is provided an apparatus, method and medium. The apparatus comprises a store buffer to store a plurality of store requests, where each of the plurality of store requests identifies a storage address and a data item to be transferred to storage beginning at the storage address, where the data item comprises a predetermined number of bytes. The apparatus is responsive to a memory access instruction indicating a store operation specifying storage of N data items, to determine an address allocation order of N consecutive store requests based on a copy direction hint indicative of whether the memory access instruction is one of a sequence of memory access instructions each identifying one of a sequence of sequentially decreasing addresses, and to allocate the N consecutive store requests to the store buffer in the address allocation order.

    Memory address translation
    14.
    发明授权

    公开(公告)号:US11709782B2

    公开(公告)日:2023-07-25

    申请号:US17512888

    申请日:2021-10-28

    Applicant: Arm Limited

    CPC classification number: G06F12/1027 G06F2212/68

    Abstract: Circuitry comprises a translation lookaside buffer to store memory address translations, each memory address translation being between an input memory address range defining a contiguous range of one or more input memory addresses in an input memory address space and a translated output memory address range defining a contiguous range of one or more output memory addresses in an output memory address space; in which the translation lookaside buffer is configured selectively to store the memory address translations as a cluster of memory address translations, a cluster defining memory address translations in respect of a contiguous set of input memory address ranges by encoding one or more memory address offsets relative to a respective base memory address; memory management circuitry to retrieve data representing memory address translations from a memory, for storage by the translation lookaside buffer, when a required memory address translation is not stored by the translation lookaside buffer; detector circuitry to detect an action consistent with access, by the translation lookaside buffer, to a given cluster of memory address translations; and prefetch circuitry, responsive to a detection of the action consistent with access to a cluster of memory address translations, to prefetch data from the memory representing one or more further memory address translations of a further set of input memory address ranges adjacent to the contiguous set of input memory address ranges for which the given cluster defines memory address translations.

    Skipping tag check for tag-checked load operation

    公开(公告)号:US11221951B1

    公开(公告)日:2022-01-11

    申请号:US17028248

    申请日:2020-09-22

    Applicant: Arm Limited

    Abstract: A tag check performed for a memory access operation comprises determining whether an address tag associated with a target address of the access corresponds to a guard tag stored in the memory system associated with a memory system location to be accessed. A given tag check architecturally required for a tag-checked load operation can be skipped when a number of tag-check-skip conditions are satisfied, including at least: that there is an older tag-checked store operation awaiting a pending tag check, for which a guard tag checked in the pending tag check is associated with a same block of one or more memory system locations as a guard tag to be checked in the given tag check; and that the address tag for the tag-checked load operation is the same as the address tag for the older tag-checked store operation.

    Apparatus and method for maintaining address translation data within an address translation cache

    公开(公告)号:US10372618B2

    公开(公告)日:2019-08-06

    申请号:US15293467

    申请日:2016-10-14

    Applicant: ARM LIMITED

    Abstract: An apparatus and method are provided for maintaining address translation data within an address translation cache. The address translation cache has a plurality of entries, where each entry is used to store address translation data used when converting a virtual address into a corresponding physical address of a memory system. Control circuitry is used to perform an allocation process to determine the address translation data to be stored in each entry. The address translation cache is used to store address translation data of a plurality of different types representing address translation data specified at respective different levels of address translation within a multiple-level page table walk. The plurality of different types comprises a final level type of address translation data that identifies a full translation from the virtual address to the physical address, and at least one intermediate level type of address translation data that identifies a partial translation of the virtual address. The control circuitry is arranged, when performing the allocation process, to apply an allocation policy that permits each of the entries to be used for any of the different types of address translation data, and to store type identification data in association with each entry to enable the type of the address translation data stored therein to be determined. Such an approach enables very efficient usage of the address translation cache resources, for example by allowing the proportion of the entries used for full address translation data and the proportion of the entries used for partial address translation data to be dynamically adapted to changing workload conditions.

    Technique for processing lookup requests, in a cache storage able to store data items of multiple supported types, in the presence of a pending invalidation request

    公开(公告)号:US12061546B2

    公开(公告)日:2024-08-13

    申请号:US17973433

    申请日:2022-10-25

    Applicant: Arm Limited

    CPC classification number: G06F12/0808 G06F12/0238 G06F2212/1021

    Abstract: Each entry in a cache has type information associated therewith to indicate a type associated with the data item stored in that entry. Lookup circuitry responds to a given lookup request by performing a lookup procedure to determine whether a hit is detected, by default performing the lookup procedure for a given subset of multiple supported types. Invalidation circuitry processes an invalidation request specifying invalidation parameters used to determine an invalidation address range and invalidation type information, in order to invalidate any data items held in the cache storage that are both associated with the invalidation address range and have an associated type that is indicated by the invalidation type information. Whilst processing of the invalidation request is yet to be completed, filtering circuitry performs a filtering operation for a received lookup request, in order to determine, in dependence on an address indication provided by the received lookup request and one or more of the invalidation parameters of the invalidation request, intersection indication data identifying, for a given type in the given subset, whether an intersection is considered to exist between any entries that would be accessed during performance of the lookup procedure for the received lookup request for the given type and any entries that will be invalidated during processing of the invalidation request, and operation of the lookup circuitry is controlled in dependence on the intersection indication data.

    Early cache querying
    18.
    发明授权

    公开(公告)号:US11989132B2

    公开(公告)日:2024-05-21

    申请号:US17864625

    申请日:2022-07-14

    Applicant: Arm Limited

    CPC classification number: G06F12/0897 G06F2212/60

    Abstract: There is provided a data processing apparatus in which receive circuitry receives a result signal from a lower level cache and a higher level cache in respect of a first instruction block. The lower level cache and the higher level cache are arranged hierarchically and transmit circuitry transmits, to the higher level cache, a query for the result signal. In response to the result signal originating from the higher level cache containing requested data, the transmit circuitry transmits a further query to the higher level cache for a subsequent instruction block at an earlier time than the further query is transmitted to the higher level cache when the result signal containing the requested data originates from the lower level cache.

    Apparatus and method with value prediction for load operation

    公开(公告)号:US11948013B2

    公开(公告)日:2024-04-02

    申请号:US17670762

    申请日:2022-02-14

    Applicant: Arm Limited

    Inventor: Abhishek Raja

    CPC classification number: G06F9/505 G06N5/02

    Abstract: An apparatus has processing circuitry, load tracking circuitry and load prediction circuitry to determine a prediction for a predicted load operation. It is determined whether the prediction is correct, and whether the tracking information indicates that, for a given younger load operation issued before it is known whether the prediction is correct, there is a risk of second target data associated with that given load operation having changed after having been loaded. Independent of whether the addresses of the predicted load operation and younger load operation correspond, at least the given load operation is re-processed when the prediction is correct and the tracking information indicates there is a risk of the second target data having changes after being loaded. This protects against ordering violations.

    Determining a restart point in out-of-order execution

    公开(公告)号:US11762664B2

    公开(公告)日:2023-09-19

    申请号:US17569157

    申请日:2022-01-05

    Applicant: Arm Limited

    Abstract: There is provided a data processing apparatus comprising decode circuitry responsive to receipt of a block of instructions to generate control signals indicative of each of the block of instructions, and to analyse the block of instructions to detect a potential hazard instruction. The data processing apparatus is provided with decode circuitry to encode information indicative of a clean restart point into the control signals associated with the potential hazard instruction. The data processing apparatus is provided with data processing circuity to perform out-of-order execution of at least some of the block of instructions, and control circuitry responsive to a determination, at execution of the potential hazard instruction, that data values used as operands for the potential hazard instruction have been modified by out-of-order execution of a subsequent instruction, to restart execution from the clean restart point and to flush held data values from the data processing circuitry.

Patent Agency Ranking