Abstract:
An apparatus and method are provided for processing instructions. The apparatus has execution circuitry for executing instructions, where each instruction requires an associated operation to be performed using one or more source operand values in order to produce a result value. Issue circuitry is used to maintain a record of pending instructions awaiting execution by the execution circuitry, and prediction circuitry is used to produce a predicted source operand value for a chosen pending instruction. Optimisation circuitry is then arranged to detect an optimisation condition for the chosen pending instruction when the predicted source operand value is such that, having regard to the associated operation for the chosen pending instruction, the result value is known without performing the associated operation. In response to detection of the optimisation condition, an optimisation operation is implemented instead of causing the execution circuitry to perform the associated operation in order to execute the chosen pending instruction. This can lead to significant performance and/or power consumption improvements.
Abstract:
An apparatus comprises processing circuitry 14 to perform data processing in response to instructions, the processing circuitry supporting speculative processing of read operations for reading data from a memory system 20, 22; and control circuitry 12, 14, 20 to identify whether a sequence of instructions to be processed by the processing circuitry includes a speculative side-channel hint instruction indicative of whether there is a risk of information leakage if at least one subsequent read operation is processed speculatively, and to determine whether to trigger a speculative side-channel mitigation measure depending on whether the instructions include the speculative side-channel hint instruction. This can help to reduce the performance impact of measures taken to protect against speculative side-channel attacks.
Abstract:
An apparatus and method are provided for controlling allocation of instructions into an instruction cache storage. The apparatus comprises processing circuitry to execute instructions, fetch circuitry to fetch instructions from memory for execution by the processing circuitry, and an instruction cache storage to store instructions fetched from the memory by the fetch circuitry. Cache control circuitry is responsive to the fetch circuitry fetching a target instruction from a memory address determined as a target address of an instruction flow changing instruction, at least when the memory address is within a specific address range, to prevent allocation of the fetched target instruction into the instruction cache storage unless the fetched target instruction is at least one specific type of instruction. It has been found that such an approach can inhibit the performance of speculation-based caching timing side-channel attacks.
Abstract:
A processing pipeline may have first and second execution circuits having different performance or energy consumption characteristics. Instruction supply circuitry may support different instruction supply schemes with different energy consumption or performance characteristics. This can allow a further trade-off between performance and energy efficiency. Architectural state storage can be shared between the execute units to reduce the overhead of switching between the units. In a parallel execution mode, groups of instructions can be executed on both execute units in parallel.
Abstract:
In response to a transfer stimulus, performance of a processing workload is transferred from a source processing circuitry to a destination processing circuitry, in preparation for the source processing circuitry to be placed in a power saving condition following the transfer. To reduce the number of memory fetches required by the destination processing circuitry following the transfer, a cache of the source processing circuitry is maintained in a powered state for a snooping period. During the snooping period, cache snooping circuitry snoops data values in the source cache and retrieves the snoop data values for the destination processing circuitry.
Abstract:
Coherency control circuitry (10) supports processing of a safe-speculative-read transaction received from a requesting master device (4). The safe-speculative-read transaction is of a type requesting that target data is returned to a requesting cache (11) of the requesting master device (4) while prohibiting any change in coherency state associated with the target data in other caches (12) in response to the safe-speculative-read transaction. In response, at least when the target data is cached in a second cache associated with a second master device, at least one of the coherency control circuitry (10) and the second cache (12) is configured to return a safe-speculative-read response while maintaining the target data in the same coherency state within the second cache. This helps to mitigate against speculative side-channel attacks.
Abstract:
An apparatus comprises prediction circuitry (40, 100, 80) for determining, based on current prediction policy information (43, 82, 104), a predicted behavior to be used for processing instructions. The current prediction policy information is updated based on an outcome of processing of instructions. A storage structure (50) stores at least one entry identifying previous prediction policy information (60) for a corresponding block of instructions. In response to an instruction from a block having a corresponding entry in the storage structure (50) which identifies the previous prediction policy information (60), the current prediction policy information (43, 82, 104) can be reset based on the previous prediction policy information 60 identified in the corresponding entry of the storage structure (50).
Abstract:
A data processing apparatus 2 performs multi-threaded processing using the processing pipeline 6, 8, 10, 12, 14, 16, 18. Flush control circuitry 30 is responsive to multiple different types of flush trigger. Different types of flush trigger result in different sets of state being flushed for the thread which resulted in the flush trigger with state for other thread not being flushed. For example, a relatively low latency stall may result in flushing back to a first flush point whereas a longer latency stall results in flushing back to a second flush point and the loss of more state data. The data flushed back to the first flushed point may be a subset of the data flushed back to the second flush point.
Abstract:
An apparatus comprises a processing pipeline comprising out-of-order execution circuitry and second execution circuitry. Control circuitry monitors at least one reordering metric indicative of an extent to which instructions are executed out of order by the out-of-order execution circuitry, and controls whether instructions are executed using the out-of-order execution circuitry or the second execution circuitry based on the reordering metric. A speculation metric indicative of a fraction of executed instructions that are flushed due to a mis-speculation can also be used to determine whether to execute instructions on first or second execution circuitry having different performance or energy consumption characteristics.
Abstract:
A hierarchical cache with at least a unified cache is used to store both instructions and data values, and a further cache coupled between processing circuitry and a unified cache. The unified cache has a plurality of cache lines identified as an instruction cache line or a data cache line. Each data cache line stores at least one data value and the associated information. Pre-decode circuitry is associated with the unified cache and performs a first pre-decode operation on a received instruction for that instruction cache line in order to generate a corresponding partially pre-decoded instruction for storing in the instruction cache line. Further pre-decode circuitry is associated with the further cache, and, when a partially pre-decoded instruction is routed to the further cache, performs a further pre-decode operation on the partially pre-decoded instruction to generate a corresponding pre-decoded instruction for storage in the further cache.