Abstract:
The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
Abstract:
This disclosure describes an adaptive memory address scanning technique that defines an address scanning pattern, to be used for a particular surface, based on one or more properties of the surface. In addition, a number, shape, and arrangement of sub-primitives of a surface to process in parallel may be determined. In one example of the disclosure, a memory accessing method for graphics processing comprises, determining, by a graphics processing unit (GPU), properties of a surface, determining, by the GPU, a memory address scanning technique based on the determined properties of the surface, and performing, by the GPU, at least one of a read or a write of data associated with the surface in a memory based on the determined memory address scanning technique.
Abstract:
Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
Abstract:
A graphics processing unit (GPU) may perform three-dimensional (3D) graphics processing in accordance with a 3D graphics pipeline using a first plurality of graphics processing hardware units of the GPU. The GPU may further perform a two-dimensional (2D) graphics operation using a second plurality of graphics processing hardware units of the GPU not used in performing the 3D graphics processing and one or more graphics processing hardware units of the first plurality of graphics processing hardware units of the GPU.
Abstract:
The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
Abstract:
A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
Abstract:
Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
Abstract:
This disclosure describes an adaptive memory address scanning technique that defines an address scanning pattern, to be used for a particular surface, based on one or more properties of the surface. In addition, a number, shape, and arrangement of sub-primitives of a surface to process in parallel may be determined. In one example of the disclosure, a memory accessing method for graphics processing comprises, determining, by a graphics processing unit (GPU), properties of a surface, determining, by the GPU, a memory address scanning technique based on the determined properties of the surface, and performing, by the GPU, at least one of a read or a write of data associated with the surface in a memory based on the determined memory address scanning technique.