Abstract:
Systems and methods for populating a cache using a hardware prefetcher are disclosed. A method for prefetching cache entries includes determining an initial stride value based on at least a first and second demand miss address in the cache, verifying the initial stride value based on a third demand miss address in the cache, prefetching a predetermined number of cache entries based on the verified initial stride value, determining an expected next miss address in the cache based on the verified initial stride value and addresses of the prefetched cache entries; and confirming the verified initial stride value based on comparing the expected next miss address to a next demand miss address in the cache. If the verified initial stride value is confirmed, additional cache entries are prefetched. If the verified initial stride value is not confirmed, further prefetching is stalled and an alternate stride value is determined.
Abstract:
Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an autoincrement-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.