Abstract:
A graphics processor includes a vertex shader 20 that processes input attribute values from a vertex buffer 26 to generate output vertex shaded attribute values 28 to be used by a rasteriser/fragment shader 22 of the graphics processor when processing an image for display. Vertex shader output attributes for which the vertex shader input attributes that the vertex shader output attribute depends on are defined solely on a per-vertex basis or solely on a per-instance basis are identified. Then, for such vertex shader output attributes, the vertex shader 20 stores, for use by the rasteriser/fragment shader 22 of the graphics processor when processing an image for display, only one copy of the vertex shader output attribute for a given vertex or instance, respectively, irrespective of the number of instances or vertices, respectively, that the output attribute value applies to.
Abstract:
A thread group generator generates from a received workload a plurality of thread groups. Each thread group consists of a plurality of threads, and at least one thread group has an interthread dependency existing between the plurality of threads. Each thread may be either an active thread whose output is required to form the result data, or a dummy thread required to resolve the inter-thread dependency for one of the active threads but whose output is not required to form the result data. A thread execution unit then executes each thread within a thread group received from the generator by executing a predetermined program. Execution flow modification circuitry is responsive to the received thread group having at least one dummy thread, to cause the unit to selectively omit at least part of the execution of at least one of the plurality of instructions when executing each dummy thread.
Abstract:
A data processing system includes an execution pipeline that includes one or more programmable execution stages which execute execution threads to execute instructions to perform data processing operations. Instructions to be executed by a group of execution threads are first fetched into an instruction cache and then read from the instruction cache for execution by the thread group. When an instruction to be executed by a thread group is present in a cache line in the instruction cache, or is to be fetched into an allocated cache line in the instruction cache, a pointer to the location of the instruction in the instruction cache is stored for the thread group. This stored pointer is then used to retrieve the instruction for execution by the thread group from the instruction cache.
Abstract:
A tile-based graphics processing pipeline that uses primitive lists that can encompass plural rendering tiles includes a primitive list reading unit that reads primitive lists for a tile being rendered to determine primitives to be processed for the tile and a rasteriser that rasterises input primitives to generate graphics fragments to be processed. The pipeline further comprises a comparison unit between the primitive list reading unit and the rasteriser that for primitives that have been read from primitive lists that include plural rendering tiles, compares the location of the primitive in the render target to the location of the tile being rendered, and then either sends the primitive onwards to the rasteriser if the comparison determines that the primitive could lie at least partially within the tile, or does not send the primitive to the rasteriser if the comparison determines that the primitive definitely does not lie within the tile.
Abstract:
A graphics processing pipeline 1 includes a rasteriser 3 that tests patches representing respective different regions of a render output against the edges of primitives 2 to determine if the primitive at least partially covers the patch and an early depth test stage 4 that performs early depth tests for primitives in respect of patches of the render output that the primitive has been found by the rasteriser at least partially to cover, by using depth test information 5 associated with a patch indicating the number and distribution of different depth value regions associated with the patch to determine the depth value region or regions associated with the patch that the primitive should be depth tested against, and then performing a depth test or tests for the primitive in respect of the respective determined depth value region or regions associated with the patch.
Abstract:
When an OpenCL kernel is to be executed, a bitfield index representation to be used for the indices of the kernel invocations is determined based on the number of bits needed to represent the maximum value that will be needed for each index dimension for the kernel. A bitfield placement data structure 33 describing how the bitfield index representation is partitioned is then prepared together with a maximum value data structure 32 indicating the maximum index dimension values to be used for the kernel. A processor then executes the kernel invocations 36 across the index space indicated by the maximum value data structure 32. A bitfield index representation 35, 37, 38 configured in accordance with the bitfield placement data structure 33 is associated with each kernel invocation to indicate its index.
Abstract:
A graphics processing pipeline determines whether respective graphics processing operations, such as respective blends, respective depth tests, etc., to be performed at a stage of the graphics processing pipeline would produce the same result for each sampling point of a set of plural sampling points represented by a fragment being processed by the graphics processing pipeline. If it is determined that respective graphics processing operations would produce the same result for each of the sampling points, then only a single instance of the graphics processing operation is performed and the result of that graphics processing operation is associated with each of the sampling points. The number of instances of the graphics processing operations needed to process the set of plural sampling points which the fragment represents is reduced in comparison to conventional multisampling graphics processing techniques which perform graphics processing operations for fragments on a “per sample” basis. The determination of whether or not the same result would be produced for each sampling point of the set of plural sampling points is facilitated by providing metadata which indicates whether or not fragment data and/or stored sample data for use when processing the sampling points is the same.
Abstract:
In a tile-based graphics processing system having plural rendering processors, the set of tiles 31 to be processed to generate an output frame 30 for display is partitioned among the different rendering processors by defining respective tile traversal paths 32, 33, 34, 35 for each rendering processor that start at a tile initially allocated to the processor and that, at least for the initial tiles along the path, traverse to spatially adjacent tiles in the output, and that will traverse every tile to be rendered if followed to their end. The next tile for a given rendering processor to process is then selected as being the next tile along its defined path, unless the next tile in the path has already been processed (or is already being processed) by another rendering processor, in which case the next tile to be allocated to the rendering processor is selected to be a free tile further on in the tile traversal path for that processor.
Abstract:
When performing texture processing operations in a graphics processing system, for a texture processing operation that requires M input texture data elements from an array of texture data elements, each of the M texture data elements is selected from a different set of texture data elements having a different set of positions within the texture data array. The texture processing operation is then performed using the M texture data elements.
Abstract:
There is disclosed a method of processing an input set of indices that may contain one or more primitive restarts to determine which indices correspond to complete primitives. A modified version of the set of indices can then be written out that contains complete primitives. In particular this is done by determining, for each index in the set of indices, the index position of the start of a sequence of indices for a sequence of primitives that the index is part of, and then determined from this whether or not the index position corresponds to the start of a complete primitive.