Abstract:
An apparatus comprises prediction circuitry (40, 100, 80) for determining, based on current prediction policy information (43, 82, 104), a predicted behaviour to be used for processing instructions. The current prediction policy information is updated based on an outcome of processing of instructions. A storage structure (50) stores at least one entry identifying previous prediction policy information (60) for a corresponding block of instructions. In response to an instruction from a block having a corresponding entry in the storage structure (50) which identifies the previous prediction policy information (60), the current prediction policy information (43, 82, 104) can be reset based on the previous prediction policy information 60 identified in the corresponding entry of the storage structure (50).
Abstract:
An apparatus comprises processing circuitry to process data access operations specifying a virtual address of data to be loaded from or stored to a data store, and proxy identifier determining circuitry to determine a proxy identifier for a data access operation to be processed by the data access circuitry, the proxy identifier having fewer bits than a physical address corresponding to the virtual address specified by the data access operation. The processing circuitry comprises at least one buffer to buffer information (including the proxy identifier) associated with one or more pending data access operations awaiting processing. Address translation circuitry determines the physical address corresponding to the virtual address specified for a data access operation after that data access operation has progressed beyond said at least one buffer.
Abstract:
Data storage apparatus comprises detection circuitry configured to detect a match between a multi-bit reference memory address and a test address, the test address being a combination of a multi-bit base address and a multi-bit address offset, the detection circuitry comprising: a comparator configured to compare, as a first comparison, a first subset of bits of the reference memory address with a combination of the corresponding first subset of bits of the base address and the corresponding first subset of bits of the address offset; the comparator being configured to compare, as a second comparison, a second, different subset of bits of the reference memory address with the corresponding second subset of bits of the base address; a detector configured to detect the match between the reference memory address and the test address when both of the first comparison and the second comparison detect a respective match; and control circuitry configured to control operation of the data storage apparatus in dependence upon the reference memory address when a match is detected by the detector.
Abstract:
An apparatus and method are provided for avoiding conflicting entries in a storage structure. The apparatus comprises a storage structure having a plurality of entries for storing data, and allocation circuitry, responsive to a trigger event for allocating new data into the storage structure, to determine a victim entry into which the new data is to be stored, and to allocate the new data into the victim entry upon determining that the new data is available. Conflict detection circuitry is used to detect when the new data will conflict with data stored in one or more entries of the storage structure, and to cause the data in said one or more entries to be invalidated. The conflict detection circuitry is arranged to perform, prior to a portion of the new data required for conflict detection being available, at least one initial stage detection operation to determine, based on an available portion of the new data, candidate entries whose data may conflict with the new data. A record of the candidate entries in then maintained, and, once the portion of the new data required for conflict detection is available, the conflict detection circuitry then performs a final stage detection operation to determine whether any of the candidate entries do contain data that conflicts with the new data. Any entries identified by the final stage detection operation as containing data that conflicts with the new data are then invalidated. This provides a particularly efficient mechanism for avoiding conflicting entries in a storage structure.
Abstract:
A processing pipeline for processing instructions with instructions from multiple threads in flight concurrently may have control circuitry to detect a stalling event associated with a given thread. In response, at least one instruction of the given thread may be flushed from the pipeline, and the control circuitry may trigger fetch circuitry to reduce a fraction of the fetched instructions which are fetched from the given thread. A mechanism is also described to determine when to trigger a predetermined action when a delay in accessing information becomes greater than a delay threshold, and to update the delay threshold based on a difference between a return delay when the information is returned from the storage circuitry and the delay threshold.
Abstract:
Apparatus for processing data 2 is provided with fetch circuitry 16 for fetching program instructions for execution from one or more active threads of instructions having respective program counter values. Pipeline circuitry 22, 24 has a first operating mode and a second operating mode. Mode switching circuitry 30 switches the pipeline circuitry 22, 24, between the first operating mode and the second operating mode in dependence upon a number of active threads of program instructions having program instructions available to be executed. The first operating mode has a lower average energy consumption per instruction executed than the second operating mode and the second operating mode has a higher average rate of instruction execution for a single thread than the first operating mode. The first operating mode may utilise a barrel processing pipeline 22 to perform interleaved multiple thread processing. The second operating mode may utilise an out-of-order processing pipeline 24 for performing out-of-order processing.
Abstract:
A processing pipeline for processing instructions with instructions from multiple threads in flight concurrently may have control circuitry to detect a stalling event associated with a given thread. In response, at least one instruction of the given thread may be flushed from the pipeline, and the control circuitry may trigger fetch circuitry to reduce a fraction of the fetched instructions which are fetched from the given thread. A mechanism is also described to determine when to trigger a predetermined action when a delay in accessing information becomes greater than a delay threshold, and to update the delay threshold based on a difference between a return delay when the information is returned from the storage circuitry and the delay threshold.
Abstract:
A mechanism is provided for improving performance when executing unaligned load instructions which load an unaligned block of data from a data store. In a first unaligned load handling mode, a final load operation of a series of load operations performed for the instruction loads a full data word extending beyond the end of the unaligned block of data to be loaded by that instruction. If an initial portion of the unaligned block of data to be loaded by a subsequent unaligned load instruction corresponds to the excess part in the stream buffer for the earlier instruction, then an initial load operation for the subsequent instruction can be suppressed. A mechanism is also described for allowing series of dependent data access operations triggered by a given instruction to be halted partway through when a stall condition arises, and resumed partway through later, by defining overlapping sequences of transactions.
Abstract:
An apparatus comprises a processing pipeline comprising out-of-order execution circuitry and second execution circuitry. Control circuitry monitors at least one reordering metric indicative of an extent to which instructions are executed out of order by the out-of-order execution circuitry, and controls whether instructions are executed using the out-of-order execution circuitry or the second execution circuitry based on the reordering metric. A speculation metric indicative of a fraction of executed instructions that are flushed due to a mis-speculation can also be used to determine whether to execute instructions on first or second execution circuitry having different performance or energy consumption characteristics.
Abstract:
A processing pipeline may have first and second execution circuits having different performance or energy consumption characteristics. Instruction supply circuitry may support different instruction supply schemes with different energy consumption or performance characteristics. This can allow a further trade-off between performance and energy efficiency. Architectural state storage can be shared between the execute units to reduce the overhead of switching between the units. In a parallel execution mode, groups of instructions can be executed on both execute units in parallel.