Abstract:
An apparatus comprises a cache memory to store data as a plurality of cache lines each having a data size and an associated physical address in a memory, access circuitry to access the data stored in the cache memory, detection circuitry to detect, for at least a set of sub-units of the cache lines stored in the cache memory, whether a number of accesses by the access circuitry to a given sub-unit exceeds a predetermined threshold, in which each sub-unit has a data size that is smaller than the data size of a cache line, prediction circuitry to generate a prediction, for a given region of a plurality of regions of physical address space, of whether data stored in that region comprises streaming data in which each of one or more portions of the given cache line is predicted to be subject to a maximum of one read operation or multiple access data in which each of the one or more portions of the given cache line is predicted to be subject to more than one read operation, the prediction circuitry being configured to generate the prediction in response to a detection by the detection circuitry of whether the number of accesses to a sub-unit of a cache line having an associated physical address in the given region exceeds the predetermined threshold, and allocation circuitry to selectively allocate a next cache line to the cache memory in dependence upon the prediction applicable to the region of physical address space containing that next cache line.
Abstract:
An apparatus and method are provided for estimating a shift amount when employing processing circuitry to perform a subtraction operation to subtract a second significand value of a second floating-point operand from a first significand value of a first floating-point operand in order to generate a difference value. Shift estimation circuitry then determines an estimated shift amount to be applied to the difference value. The shift estimation circuitry comprises significand analysis circuitry to generate, from analysis of the significand values of the two floating-point operands, a first bit string identifying a most significant bit position within the difference value that is predicted to have its bit set to a determined value. In parallel, shift limiting circuitry generates from an exponent value a second bit string identifying a shift limit bit position. The shift limiting circuitry has computation circuitry to perform, for each bit position in at least a subset of bit positions of the second bit string, an associated computation using bits of the exponent value to determine a value for that bit position within the second bit string. The associated computation is different for different bit positions. Combining circuitry then generates a combined bit string from the first and second bit strings, and shift determination circuitry determines the estimated shift amount from the combined bit string.
Abstract:
Data storage apparatus comprises detection circuitry configured to detect a match between a multi-bit reference memory address and a test address, the test address being a combination of a multi-bit base address and a multi-bit address offset, the detection circuitry comprising: a comparator configured to compare, as a first comparison, a first subset of bits of the reference memory address with a combination of the corresponding first subset of bits of the base address and the corresponding first subset of bits of the address offset; the comparator being configured to compare, as a second comparison, a second, different subset of bits of the reference memory address with the corresponding second subset of bits of the base address; a detector configured to detect the match between the reference memory address and the test address when both of the first comparison and the second comparison detect a respective match; and control circuitry configured to control operation of the data storage apparatus in dependence upon the reference memory address when a match is detected by the detector.
Abstract:
An apparatus and method are provided for processing instructions from a plurality of threads. The apparatus comprises a processing pipeline to process instructions, including fetch circuitry to fetch instructions from a plurality of threads for processing by the processing pipeline, and execution circuitry to execute the fetched instructions. Execution hint instruction handling circuitry is then responsive to the fetch circuitry fetching an execution hint instruction for a first thread, to treat the execution hint instruction, at least in a presence of a suspension condition, as a predicted branch instruction with a predicted behaviour, and to cause the fetch circuitry to suspend fetching of instructions for the first thread. The execution circuitry is then arranged to execute the predicted branch instruction with a behaviour different to the predicted behaviour, in order to trigger a misprediction condition. The fetch circuitry is then responsive to the misprediction condition to resume fetching of instructions for the first thread. This provides a reliable mechanism for temporarily suspending fetching of instructions for a thread in response to a hint instruction, whilst still reliably resuming fetching in due course.
Abstract:
An apparatus includes a processing pipeline comprising a plurality of stages, the plurality of stages including at least one instruction fusing stage to detect whether a block of instructions to be processed comprises a fusible group of instructions, and to generate a fused instruction to be processed by a subsequent stage of the processing pipeline when said block of instructions comprises said fusible group. However, when said block of instructions comprises a partial subset of said fusible group of instructions, the instruction fusing stage is configured to delay handling of said partial subset of said fusible group of instructions until the instruction fusing stage has determined whether at least one subsequent block of instructions to be processed comprises a remaining subset of instructions of said fusible group.
Abstract:
A data processing apparatus has a pipeline for performing a processing operation involving a conditional step which is required only if at least one input operand satisfies a predetermined condition. Control circuitry detects whether the condition is satisfied. If not, then the pipeline is controlled to perform the operation bypassing the conditional step to generate the output operand a first number of cycles later than a start cycle in which the operation starts, and the output operand is forwarded over a forwarding path. If the condition is satisfied, then the pipeline performs the operation including the conditional step to generate the output operand a second number of cycles later than the start cycle, where the second number is greater than the first number. The output operand is written to a destination register the same number of cycles later than the start cycle regardless of whether the condition is satisfied.
Abstract:
A data processing device comprises a plurality of system registers and a set of interrupt handling registers for controlling handling of an incoming interrupt. The device also includes processing circuitry configured to execute software of the plurality of execution levels, and interrupt controller circuitry configured to route said incoming interrupts to interrupt handling software that is configured to run at one of said plurality of execution levels, and register access control circuitry configured to dynamically control access to at least some of said interrupt handling registers in dependence upon one of said plurality of execution levels that said incoming interrupt is routed to. The interrupt handling software configured to run at a particular execution level does not have access to interrupt handling registers for handling a different incoming interrupt that is routed to interrupt handling software that is configured to run at a more privileged execution level.
Abstract:
Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.
Abstract:
Coherency control circuitry (10) supports processing of a safe-speculative-read transaction received from a requesting master device (4). The safe-speculative-read transaction is of a type requesting that target data is returned to a requesting cache (11) of the requesting master device (4) while prohibiting any change in coherency state associated with the target data in other caches (12) in response to the safe-speculative-read transaction. In response, at least when the target data is cached in a second cache associated with a second master device, at least one of the coherency control circuitry (10) and the second cache (12) is configured to return a safe-speculative-read response while maintaining the target data in the same coherency state within the second cache. This helps to mitigate against speculative side-channel attacks.
Abstract:
An apparatus and method of operating a data processing apparatus are disclosed. The apparatus comprises data processing circuitry to perform data processing operations in response to a sequence of instructions, wherein the data processing circuitry is capable of performing speculative execution of at least some of the sequence of instructions. A cache structure comprising entries stores temporary copies of data items which are subjected to the data processing operations and speculative execution tracking circuitry monitors correctness of the speculative execution and responsive to indication of incorrect speculative execution to cause entries in the cache structure allocated by the incorrect speculative execution to be evicted from the cache structure.