Abstract:
A processing pipeline may have first and second execution circuits having different performance or energy consumption characteristics. Instruction supply circuitry may support different instruction supply schemes with different energy consumption or performance characteristics. This can allow a further trade-off between performance and energy efficiency. Architectural state storage can be shared between the execute units to reduce the overhead of switching between the units. In a parallel execution mode, groups of instructions can be executed on both execute units in parallel.
Abstract:
An apparatus comprises a reduction tree to rank a given item of a set of M items relative to other items of the set of M items, in dependence on ranking information indicating an order of preference for the set of M items. The reduction tree has a number of levels of node circuits arranged in a tree structure, each node circuit configured to generate a plurality of node output signals indicative of whether a corresponding subset of the set of M items includes at least N more preferred items than the given item, where N≥2. A node circuit at a level of the reduction tree other than a first level is configured to combine the node output signals generated by at least two node circuits at a previous level of the reduction tree, such that the number of items in the corresponding subset increases through successive levels of the reduction tree, until the subset of items corresponding to a root node circuit at a final level of the reduction tree comprises the set of M items.
Abstract:
An apparatus (2) has a processing pipeline (4) supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure (22, 30, 36, 50, 40, 64, 44) is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry (70) triggers a subset (102) of the entries of the storage structure to be placed in a power saving state.
Abstract:
A processing pipeline may have first and second execution circuits having different performance or energy consumption characteristics. Instruction supply circuitry may support different instruction supply schemes with different energy consumption or performance characteristics. This can allow a further trade-off between performance and energy efficiency. Architectural state storage can be shared between the execute units to reduce the overhead of switching between the units. In a parallel execution mode, groups of instructions can be executed on both execute units in parallel.
Abstract:
A processor has a processing pipeline with first, second and third stages. An instruction at the first stage takes fewer cycles to reach the second stage then the third stage. The second and third stages each have a duplicated processing resource. For a pending instruction which requires the duplicated resource and can be processed using the duplicated resource at either of the second and third stages, the first stage determines whether a required operand would be available when the pending instruction would reach the second stage. If the operand would be available, then the pending instruction is processed using the duplicated resource at the second stage, while if the operand would not be available in time then the instruction is processed using the duplicated resource in the third pipeline stage. This technique helps to reduce delays caused by data dependency hazards.
Abstract:
An apparatus supports decoding and execution of a bulk memory instruction specifying a block size parameter. The apparatus comprises control circuitry to determine whether the block size corresponding to the block size parameter exceeds a predetermined threshold, and performs a micro-architectural control action to influence the handling of at least one bulk memory operation by memory operation processing circuitry. The micro-architectural control action varies depending on whether the block size exceeds the predetermined threshold, and further depending on the states of other components and operations within or coupled with the apparatus. The micro-architectural control action could include an alignment correction action, cache allocation control action, or processing circuitry selection action.
Abstract:
An apparatus and method are provided for processing instructions. The apparatus has execution circuitry for executing instructions, where each instruction requires an associated operation to be performed using one or more source operand values in order to produce a result value. Issue circuitry is used to maintain a record of pending instructions awaiting execution by the execution circuitry, and prediction circuitry is used to produce a predicted source operand value for a chosen pending instruction. Optimisation circuitry is then arranged to detect an optimisation condition for the chosen pending instruction when the predicted source operand value is such that, having regard to the associated operation for the chosen pending instruction, the result value is known without performing the associated operation. In response to detection of the optimisation condition, an optimisation operation is implemented instead of causing the execution circuitry to perform the associated operation in order to execute the chosen pending instruction. This can lead to significant performance and/or power consumption improvements.
Abstract:
An apparatus comprises prediction circuitry (40, 100, 80) for determining, based on current prediction policy information (43, 82, 104), a predicted behaviour to be used for processing instructions. The current prediction policy information is updated based on an outcome of processing of instructions. A storage structure (50) stores at least one entry identifying previous prediction policy information (60) for a corresponding block of instructions. In response to an instruction from a block having a corresponding entry in the storage structure (50) which identifies the previous prediction policy information (60), the current prediction policy information (43, 82, 104) can be reset based on the previous prediction policy information 60 identified in the corresponding entry of the storage structure (50).
Abstract:
An apparatus and method are provided for managing a branch information storage. The apparatus has a processor to process instructions, comprising fetch circuitry to fetch instructions from a plurality of threads for processing by the processor. The branch information storage has a plurality of entries, each entry storing a virtual address identifier for a branch instruction, branch information about the branch instruction, and thread identifier information indicating which of the plurality of threads that entry is valid for. The fetch circuitry is arranged to access the branch information storage using a virtual address of an instruction to be fetched for one of the plurality of threads, in order to determine whether a hit condition exists, and in that event to obtain the branch information stored in the entry that gave rise to the hit condition. The apparatus also has address translation circuitry to apply an address translation regime to convert the virtual address into a physical address, at least one address translation regime being specified for each thread. When allocating an entry into the branch information storage, allocation circuitry is arranged to determine, for at least one branch instruction for a current thread, whether the address translation regime is shared with the current thread and at least one other thread. In that event, the allocation circuitry then identifies within the thread identifier information of the allocated entry both the current thread and any other thread for which the address translation regime is shared. Such an approach can significantly alleviate the space constraints on the branch information storage, when employed within an apparatus that supports fine-grained multithreading.
Abstract:
Processing circuitry includes execute circuitry for executing micro-operations in response to instructions fetched from a data store. Control circuitry is provided to determine, based on availability of at least one processing resource, how many micro-operations are to be executed by the execute circuitry in response to a given set of one or more instructions fetched from the data store.