Determining prefetch patterns with discontinuous strides

    公开(公告)号:US11385896B2

    公开(公告)日:2022-07-12

    申请号:US15930907

    申请日:2020-05-13

    Applicant: Arm Limited

    Abstract: An apparatus and method are provided. The apparatus comprises storage circuitry to store a plurality of data elements. Processing circuitry executes a stream of instructions comprising access instructions that access some of the data elements at given locations. Training circuitry determines a pattern of the given locations based on the access instructions. Prefetch circuitry performs prefetches based on the pattern and filter circuitry filters the access instructions used by the training circuitry to determine the pattern by including discontinuous access instructions whose given location raises a discontinuity with the given location of a previous access instruction. In this way, it is possible to perform prefetching by calculating, rather than guessing, at a cumulative stride between the access instructions.

    Methods and apparatus for processing prefetch pattern storage data

    公开(公告)号:US12235768B2

    公开(公告)日:2025-02-25

    申请号:US18350135

    申请日:2023-07-11

    Applicant: Arm Limited

    Abstract: Aspects of the present disclosure relate to an apparatus comprising prefetch pattern storage circuitry and pattern training circuitry. The pattern training circuitry detects patterns of data access for updating one or more corresponding pattern storage entries. The pattern training circuitry comprises a plurality of training entries, associated with a given accessed storage location. Each said training entry comprises a plurality of regions. For a given training entry, at least one region is configured to store information for which a subsequent access offset is positive, and at least one region is configured to store information for which said offset is negative. The pattern training circuitry is configured to transmit data indicative of said information to the prefetch pattern storage circuitry. The prefetch pattern storage circuitry is responsive to receiving said transmitted data to update at least one corresponding pattern storage element.

    Technique for predicting behaviour of control flow instructions

    公开(公告)号:US12182574B2

    公开(公告)日:2024-12-31

    申请号:US18312052

    申请日:2023-05-04

    Applicant: Arm Limited

    Abstract: An apparatus is provided having pointer storage to store pointer values for a plurality of pointers, with the pointer values of the pointers being differentially incremented in response to a series of increment events. Tracker circuitry maintains a plurality of tracker entries, each tracker entry identifying a control flow instruction and a current active pointer (from amongst the pointers) to be associated with that control flow instruction. Cache circuitry maintains a plurality of cache entries, each cache entry storing a resolved behaviour of an instance of a control flow instruction identified by a tracker entry along with an associated tag value generated when the resolved behaviour was allocated into that cache entry. For a given entry the associated tag value may be generated in dependence on an address indication of the control flow instruction whose resolved behaviour is being stored in that entry and the current active pointer associated with that control flow instruction. Prediction circuitry is responsive to a prediction trigger associated with a replay of a given instance of a given control flow instruction identified by a tracker entry, to cause a lookup operation to be performed by the cache circuitry using a comparison tag value generated in dependence on the address indication of the given control flow instruction and the current active pointer. In the event of a hit being detected in a given cache entry, the resolved behaviour stored in the given cache entry is used as the predicted behaviour of the given instance of the given control flow instruction, provided a prediction confidence metric is met.

    Methods and apparatus for training prefetch information

    公开(公告)号:US11599473B1

    公开(公告)日:2023-03-07

    申请号:US17505957

    申请日:2021-10-20

    Applicant: Arm Limited

    Abstract: Aspects of the present disclosure relate to an apparatus comprising prefetch information storage circuitry and prefetch training circuitry. The prefetch training circuitry comprises a plurality of entries, and is configured to: allocate a given entry to a given data address region; receive access information indicative of data accesses within the given data address region; based on said access information, train prefetch information associated with the given data address region, the prefetch information being indicative of a pattern of said data accesses within the given data address region; and responsive to an eviction condition being met after an elapsed period, since said allocation of the given entry, has exceeded a threshold, perform an eviction comprising transferring the prefetch information associated with the given data address region to the prefetch information storage circuitry.

    Store buffer
    5.
    发明授权

    公开(公告)号:US12223202B2

    公开(公告)日:2025-02-11

    申请号:US17693817

    申请日:2022-03-14

    Applicant: Arm Limited

    Abstract: An apparatus comprises processing circuitry to issue store operations to store data to a data store and load operations to load data from the data store and a store buffer comprising entries to store entry information corresponding to store operations in advance of the store operations completing. Store buffer lookup circuitry is provided to lookup, in response to a load operation, whether the store buffer contains a corresponding entry corresponding to an older store operation for which target addresses of the load operation and the older store operation satisfy an address comparison condition. The store buffer lookup circuitry is configured to perform store-to-load forwarding in response to the load operation when the corresponding entry is a first type of store buffer entry satisfying a forwarding condition, and delay processing of the load operation when the corresponding entry is a second type of store buffer entry satisfying the forwarding condition.

    Compression of entries in a reorder buffer

    公开(公告)号:US12175251B2

    公开(公告)日:2024-12-24

    申请号:US18107139

    申请日:2023-02-08

    Applicant: Arm Limited

    Abstract: There is provided an apparatus, method and medium. The apparatus comprises processing circuitry to process instructions and a reorder buffer identifying a plurality of entries having state information associated with execution of one or more of the instructions. The apparatus comprises allocation circuitry to allocate entries in the reorder buffer, and to allocate at least one compressed entry corresponding to a plurality of the instructions. The apparatus comprises memory access circuitry responsive to an address associated with a memory access instruction corresponding to access-sensitive memory and the memory access instruction corresponding to the compressed entry, to trigger a reallocation procedure comprising flushing the memory access instruction and triggering reallocation of the memory access instruction without the compression. The allocation circuitry is responsive to a frequency of occurrence of memory access instructions addressing the access-sensitive memory meeting a predetermined condition, to suppress the compression whilst the predetermined condition is met.

    Control flow prediction using pointers

    公开(公告)号:US11983533B2

    公开(公告)日:2024-05-14

    申请号:US17851266

    申请日:2022-06-28

    Applicant: Arm Limited

    CPC classification number: G06F9/30058 G06F9/3861

    Abstract: There is provided a data processing apparatus comprising history storage circuitry that stores sets of behaviours of helper instructions for a control flow instruction. Pointer storage circuitry stores pointers, each associated with one of the sets. The behaviours in the one of the sets are indexed according to one of the pointers associated with that one of the sets. Increment circuitry increments at least some of the pointers in response to an increment event and prediction circuitry determines a predicted behaviour of the control flow instruction using one of the sets of behaviours.

    Multiple stride prefetching
    8.
    发明授权

    公开(公告)号:US10769070B2

    公开(公告)日:2020-09-08

    申请号:US16140625

    申请日:2018-09-25

    Applicant: Arm Limited

    Abstract: Apparatuses and methods for prefetch generation are disclosed. Prefetching circuitry receives addresses specified by load instructions and can cause retrieval of a data value from an address before that address is received. Stride determination circuitry determines stride values as a difference between a current address and a previously received address. Plural stride values corresponding to a sequence of received addresses are determined. Multiple stride storage circuitry stores the plurality of stride values determined by the stride determination circuitry. New address comparison circuitry determines whether a current address corresponds to a matching stride value based on the plurality of stride values stored in the multiple stride storage circuitry. Prefetch initiation circuitry can causes a data value to be retrieved from a further address, wherein the further address is the current address modified by the matching stride value of the plurality of stride values. By the use of multiple stride values, more complex load address patterns can be prefetched.

    Replacement control for candidate producer-consumer relationships trained for prefetch generation

    公开(公告)号:US12045170B2

    公开(公告)日:2024-07-23

    申请号:US17545121

    申请日:2021-12-08

    Applicant: Arm Limited

    CPC classification number: G06F12/0862 G06N20/00 G06F2212/602

    Abstract: Prefetch generation circuitry generates requests to prefetch data to a cache, where the prefetch generation circuitry is configured to initiate a producer prefetch to request return of producer data having a producer address and to initiate at least one consumer prefetch to request prefetching of consumer data to the cache, the consumer data having an address derived from the producer data returned in response to the producer prefetch. Training circuitry updates, based on executed load operations, a training table indicating candidate producer-consumer relationships being trained for use by the prefetch generation circuitry in generating the producer/consumer prefetches. Replacement control circuitry controls replacement of candidate producer-consumer relationships based on a producer-data-consumer-operand (PD-CO) match-based replacement policy criterion, which depends on whether a PD-CO match condition, indicative of the producer data for a producer load matching an address operand of a consumer load, is satisfied for existing/new candidate producer-consumer relationships.

Patent Agency Ranking