Abstract:
Address translation circuitry and a method of operating such a translation circuitry are provided. The address translation circuitry is configured to receive a first address used in a first addressing system and to translate it into a second address used in a second addressing system. Translation pipeline circuitry has plural pipeline stages configured to translate the first address into the second address over the course of the plural pipeline stages. Address comparison circuitry is configured to identify an address match condition when a received first address at least partially matches a previously received first address. Insertion circuitry is configured to determine a stage of progress of the previously received first address in the plural pipeline stages and to cause content of the stage of progress of the previously received first address to be unchanged at a next pipeline cycle when the address comparison circuitry identifies the address match condition.
Abstract:
There is provided an apparatus configured to operate as a shader core, the shader core configured to perform a complex rendering process comprising a rendering process and a machine learning process, the shader core comprising: one or more tile buffers configured to store data locally to the shader core, wherein during the rendering process, the one or more tile buffers are configured to store rendered fragment data relating to a tile; and during the machine learning process, the one or more tile buffers are configured to store an input feature map, kernel weights or an output feature map relating to the machine learning process.
Abstract:
A warp processing unit controls, in dependence on a warp program counter shared between a plurality of threads processing respective graphics fragments, fetching of a next instruction to be executed for at least some of the plurality of threads. In response to a determination that a given subset of threads is to be discarded when at least one other subset of threads is to continue, the warp processing unit processes the given subset of threads in a discarded state. For a thread processed in the discarded state, execution of instructions continues for the discarded thread, and at least one of: generation of data access messages triggered by the discarded thread is suppressed; and at least one processing operation, which would be deferred until completion of the discarded thread had the thread not been discarded, is enabled to be commenced independently of an outcome of the discarded thread.
Abstract:
A graphics processing apparatus and method of performing graphics processing are provided. The graphics processing apparatus comprises a sequence of processing stages capable of performing graphics processing to generate a frame of display data. The graphics processing is performed on a tile-by-tile basis. The graphics processing apparatus is capable of determining if a current tile subject to the graphics processing is empty. At least one processing stage of the sequence of processing stages is omitted for graphics processing of the current tile in dependence on whether the current tile is empty.
Abstract:
There is provided a processor configured to transfer data to a plurality of processor circuits. The apparatus includes broadcast circuitry that broadcasts first machine learning data to at least a subset of the plurality of processor circuits.
Abstract:
An apparatus and method are provided for executing thread groups. The apparatus comprises scheduling circuitry for selecting for execution a first thread group from a plurality of thread groups, and thread processing circuitry that is responsive to the scheduling circuitry to execute active threads of the first thread group in dependence on a common program counter shared between the active threads. In response to an exit event occurring for the first thread group, the thread processing circuitry determines whether a program counter check condition is present, and this can be used to trigger program counter checking circuitry to perform a program counter check operation to update the common program counter and an active thread indication for the first thread group. The thread processing circuitry is provided with register storage in which program counter information for each thread of the first thread group can be stored, and the program counter checking circuitry is arranged to have access to that register storage when performing the program counter check operation. Further, the scheduling circuitry is arranged to select, for execution by the thread processing circuitry, a different thread group whilst awaiting performance of the program counter check operation by the program counter checking circuitry for the first thread group. This provides an area efficient mechanism for handling divergence and re-convergence of threads within thread groups, in a manner that avoids impacting performance.
Abstract:
A data processing apparatus has control circuitry for detecting whether a first micro-operation to be processed by a first processing lane would give the same result as a second micro-operation processed by a second processing lane. if they would give the same result, then the first micro-operation is prevented from being processed by the first processing lane and the result of the second micro-operation is output as the result of the first micro-operation. This avoids duplication of processing, to save energy for example.
Abstract:
Disclosed is a method of handling thread termination events within a graphics processor when a group of plural execution lanes are executing in a co-operative state. When a group of lanes is in the co-operative state, in response to the graphics processor encountering an event that means that a subset of one or more execution threads associated with the group of execution lanes in the co-operative state should be terminated: it is determined whether a condition to immediately terminate the subset of one or more execution threads is met. When the condition is not met, the group of execution lanes continue their execution in the co-operative state, but a record is stored to track that the threads in the subset of one or more execution threads should subsequently be terminated.
Abstract:
A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by processing circuitry is for the same data processing operation and specifies the same at least one operand as the last valid micro-operation processed by the processing circuitry. If so, then the control circuitry prevents the processing circuitry processing the current micro-operation so that an output register is not updated in response to the current micro-operation, and outputs the current value stored in the output register as the result of the current micro-operation. This allows power consumption to be reduced or performance to be improved by not repeating the same computation.
Abstract:
A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by a processing pipeline would give the same result as an earlier micro-operation. If so, then the current micro-operation is passed through the processing pipeline, with at least one pipeline stage passed by the current micro-operation being placed in a power saving state during a processing cycle in which the current micro-operation is at that pipeline stage. The result of the earlier micro-operation is then output as a result of said current micro-operation. This allows power consumption to be reduced by not repeating the same computation.