-
公开(公告)号:US12223202B2
公开(公告)日:2025-02-11
申请号:US17693817
申请日:2022-03-14
Applicant: Arm Limited
Inventor: Abhishek Raja , Balaji Vijayan , Alexander Cole Shulyak
Abstract: An apparatus comprises processing circuitry to issue store operations to store data to a data store and load operations to load data from the data store and a store buffer comprising entries to store entry information corresponding to store operations in advance of the store operations completing. Store buffer lookup circuitry is provided to lookup, in response to a load operation, whether the store buffer contains a corresponding entry corresponding to an older store operation for which target addresses of the load operation and the older store operation satisfy an address comparison condition. The store buffer lookup circuitry is configured to perform store-to-load forwarding in response to the load operation when the corresponding entry is a first type of store buffer entry satisfying a forwarding condition, and delay processing of the load operation when the corresponding entry is a second type of store buffer entry satisfying the forwarding condition.
-
公开(公告)号:US11775440B2
公开(公告)日:2023-10-03
申请号:US17579842
申请日:2022-01-20
Applicant: Arm Limited
Inventor: Alexander Cole Shulyak , Balaji Vijayan , Karthik Sundaram , Yasuo Ishii , Joseph Michael Pusdesris
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/1024 , G06F2212/602
Abstract: Indirect prefetch circuitry initiates a producer prefetch requesting return of producer data having a producer address and at least one consumer prefetch to request prefetching of consumer data having a consumer address derived from the producer data. A producer prefetch filter table stores producer filter entries indicative of previous producer addresses of previous producer prefetches. Initiation of a requested producer prefetch for producer data having a requested producer address is suppressed when a lookup of the producer prefetch filter table determines that the requested producer address hits against a producer filter entry of the table. The lookup of the producer prefetch filter table for the requested producer address depends on a subset of bits of the requested producer address including at least one bit which distinguishes different chunks of data within a same cache line.
-
公开(公告)号:US10817426B2
公开(公告)日:2020-10-27
申请号:US16139160
申请日:2018-09-24
Applicant: Arm Limited
Inventor: Krishnendra Nathella , Chris Abernathy , Huzefa Moiz Sanjeliwala , Dam Sunwoo , Balaji Vijayan
IPC: G06F12/0862 , G06F9/30
Abstract: A variety of data processing apparatuses are provided in which stride determination circuitry determines a stride value as a difference between a current address and a previously received address. Stride storage circuitry stores an association between stride values determined by the stride determination circuitry and a frequency during a training period. Prefetch circuitry causes a further data value to be proactively retrieved from a further address. The further address is the current address modified by a stride value in the stride storage circuitry having a highest frequency during the training period. The variety of data processing apparatuses are directed towards improving efficiency by variously disregarding certain candidate stride values, considering additional further addresses for prefetching by using multiple stride values, using feedback to adjust the training process and compensating for page table boundaries.
-
公开(公告)号:US11385896B2
公开(公告)日:2022-07-12
申请号:US15930907
申请日:2020-05-13
Applicant: Arm Limited
Inventor: Alexander Cole Shulyak , Joseph Michael Pusdesris , Adrian Montero , Balaji Vijayan
Abstract: An apparatus and method are provided. The apparatus comprises storage circuitry to store a plurality of data elements. Processing circuitry executes a stream of instructions comprising access instructions that access some of the data elements at given locations. Training circuitry determines a pattern of the given locations based on the access instructions. Prefetch circuitry performs prefetches based on the pattern and filter circuitry filters the access instructions used by the training circuitry to determine the pattern by including discontinuous access instructions whose given location raises a discontinuity with the given location of a previous access instruction. In this way, it is possible to perform prefetching by calculating, rather than guessing, at a cumulative stride between the access instructions.
-
公开(公告)号:US11194574B2
公开(公告)日:2021-12-07
申请号:US16521663
申请日:2019-07-25
Applicant: Arm Limited
Inventor: Miles Robert Dooley , Balaji Vijayan , Huzefa Moiz Sanjeliwala , Abhishek Raja , Sharmila Shridhar
IPC: G06F9/30
Abstract: An apparatus is described, comprising load issuing circuitry configured to issue load operations to load data from memory, and memory ordering tracking storage circuitry configured to store memory ordering tracking information on issued load operations. The apparatus also includes control circuitry configured to access the memory ordering tracking storage circuitry to determine, using the memory ordering tracking information, whether at least one load operation has been issued in disagreement with a memory ordering requirement, and, if so, to determine whether to re-issue one or more issued load operations or to continue issuing load operations despite disagreement with the memory ordering requirement. Furthermore, the control circuitry is capable of merging the memory ordering tracking information for a plurality of issued load operations into a merged entry in the memory ordering tracking storage circuitry.
-
公开(公告)号:US11188475B1
公开(公告)日:2021-11-30
申请号:US17061965
申请日:2020-10-02
Applicant: Arm Limited
Inventor: Joseph Michael Pusdesris , Balaji Vijayan
IPC: G06F12/0897 , G06F12/0871 , G06F12/14 , G06F12/02 , G06F12/0864
Abstract: A technique is provided for managing caches in a cache hierarchy. An apparatus has processing circuitry for performing operations and a plurality of caches for storing data for reference by the processing circuitry when performing the operations. The plurality of caches form a cache hierarchy including a given cache at a given hierarchical level and a further cache at a higher hierarchical level. The given cache is a set associative cache having a plurality of cache ways, and the given cache and the further cache are arranged such that the further cache stores a subset of the data in the given cache. In response to an allocation event causing data for a given memory address to be stored in the further cache, the given cache issues a way indication to the further cache identifying which cache way in the given cache the data for the given memory address is stored in. In response to the allocation event, the further cache not only stores the data for the given memory address, but also retains the way indication whilst the data for the given memory address remains stored within the further cache. When the further cache subsequently issues a message to the given cache relating to the data for the given memory address, it provides the way indication to the given cache for use in controlling an access to the given cache.
-
公开(公告)号:US20200097409A1
公开(公告)日:2020-03-26
申请号:US16139160
申请日:2018-09-24
Applicant: Arm Limited
Inventor: Krishnendra Nathella , Chris Abernathy , Huzefa Moiz Sanjeliwala , Dam Sunwoo , Balaji Vijayan
IPC: G06F12/0862 , G06F9/30
Abstract: A variety of data processing apparatuses are provided in which stride determination circuitry determines a stride value as a difference between a current address and a previously received address. Stride storage circuitry stores an association between stride values determined by the stride determination circuitry and a frequency during a training period. Prefetch circuitry causes a further data value to be proactively retrieved from a further address. The further address is the current address modified by a stride value in the stride storage circuitry having a highest frequency during the training period. The variety of data processing apparatuses are directed towards improving efficiency by variously disregarding certain candidate stride values, considering additional further addresses for prefetching by using multiple stride values, using feedback to adjust the training process and compensating for page table boundaries.
-
-
-
-
-
-