Abstract:
Data processing circuitry comprises a cache memory to cache a subset of data elements from a main memory; a processing element to execute program code to access data elements having respective memory addresses, the processing element being configured to access the data elements in the cache memory and, in the case of a cache miss, to fetch the data elements from the main memory; prefetch circuitry, responsive to an access to a current data element, to initiate prefetching into the cache memory of a data element at a memory address defined by a current offset value relative to the address of the current data element; and offset value selection circuitry comprising: an address table to store memory addresses for which a data element accessed by the processing element resulted in a cache miss or an access to a previously prefetched data element; and detector circuitry to detect, for each of a group of candidate offset values, one or more respective metrics representing a proportion of a set of data element accesses which resulted in a cache miss or an access to a previously prefetched data element, for which the memory address for that data element access differs by the candidate offset value from a memory address in the address table; in which the detector circuitry is configured to process the group of candidate offset values as successive complementary sub-groups of one or more of the group of candidate offset values and to set a next instance of the current offset value in response to processing each sub-group, in dependence upon the proportions indicated by the one or more detected metrics for that sub-group; and the one or more metrics previously detected for the current offset value.
Abstract:
A data processing apparatus is provided. Instruction send circuitry sends an instruction to an external processor to be executed by the external processor. Allocation circuitry allocates a specified one of several registers for a result of the instruction having been executed on the external processor and data receive circuitry receives the result of the instruction having been executed on the external processor and stores the result in the specified one of the several registers. In response to a condition being met: the specified one of the several registers is dereserved prior to the result being received by the data receive circuitry, and the result is discarded by the data receive circuitry when the result is received by the data receive circuitry.
Abstract:
A technique is provided for training a prediction apparatus. The apparatus has an input interface for receiving a sequence of training events indicative of program instructions, and identifier value generation circuitry for performing an identifier value generation function to generate, for a given training event received at the input interface, an identifier value for that given training event. The identifier value generation function is arranged such that the generated identifier value is dependent on at least one register referenced by a program instruction indicated by that given training event. Prediction storage is provided with a plurality of training entries, where each training entry is allocated an identifier value as generated by the identifier value generation function, and is used to maintain training data derived from training events having that allocated identifier value. Matching circuitry is then responsive to the given training event to detect whether the prediction storage has a matching training entry (i.e. an entry whose allocated identifier value matches the identifier value for the given training event). If so, it causes the training data in the matching training entry to be updated in dependence on the given training event.
Abstract:
Data processing circuitry comprises a cache memory to cache a subset of data elements from a main memory; a processing element to execute program code to access data elements having respective memory addresses, the processing element being configured to access the data elements in the cache memory and, in the case of a cache miss, to fetch the data elements from the main memory; prefetch circuitry, responsive to an access to a current data element, to initiate prefetching into the cache memory of a data element at a memory address defined by a current offset value relative to the address of the current data element; offset value selection circuitry comprising: an address table to store memory addresses for which a data element accessed by the processing element resulted in a cache miss or an access to a previously prefetched data element; detector circuitry to detect, for each of a group of candidate offset values, one or more respective metrics representing a proportion of a set of data element accesses which resulted in a cache miss or an access to a previously prefetched data element, for which the memory address for that data element access differs by the candidate offset value from a memory address in the address table; in which the detector circuitry is configured to set a next instance of the current offset value in response to the one or more detected metrics; verification circuitry to detect, at one or more predetermined stages with respect to the processing of the group of candidate offset values by the offset value selection circuitry, one or more verification metrics representing a proportion of a set of data element accesses which resulted in a cache miss or an access to a previously prefetched data element, for which the memory address for that data element access differs by the current offset value from a memory address in the address table, to detect whether the one or more verification metrics comply with a predetermined condition; and control circuitry to inhibit prefetching at least until a next selection of a current offset value by the offset value selection circuitry, in response to a detection by the verification circuitry that the one or more verification metrics do not comply with the predetermined condition.
Abstract:
Apparatus and a corresponding method of operating a hub device, and a target device, in a coherent interconnect system are presented. A cache pre-population request of a set of coherency protocol transactions in the system is received from a requesting master device specifying at least one data item and the hub device responds by cause a cache pre-population trigger of the set of coherency protocol transactions specifying the at least one data item to be transmitted to a target device. This trigger can cause the target device to request that the specified at least one data item is retrieved and brought into cache. Since the target device can therefore decide whether to respond to the trigger or not, it does not receive cached data unsolicited, simplifying its configuration, whilst still allowing some data to be pre-cached.
Abstract:
A data processing apparatus includes a processor and a hierarchical data storage system, including a memory and a cache, for storing the data and the instructions in storage locations identified by physical addresses. The apparatus includes address translation circuitry for mapping the virtual addresses to the physical addresses and load store circuitry receiving access requests from the processor. The store circuitry accesses the translation circuitry to identify physical addresses that correspond to virtual addresses of the received data access requests, and to access the corresponding physical addresses in the hierarchical data storage system. Preload circuitry receives preload requests from the processor indicating virtual addresses storage locations that are to be preloaded. Prefetch circuitry monitors at least some of the accesses performed by the load store circuitry and predicts addresses to be accessed subsequently, and transmits the predicted addresses to the preload circuitry as preload requests.