Abstract:
Apparatuses, systems, and techniques to transform and store information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed and stored.
Abstract:
Apparatuses, systems, and techniques to access a multidimensional tensor from memory while minimizing tile quantization is disclosed. In at least one embodiment, a processor includes one or more circuits to cause a first one or more portions of at least one tensor to be accessed from a memory using a first technique and a second one or more portions of the at least one tensor to be accessed from the memory using a second technique based, at least in part, on an input to combine the first and second techniques.
Abstract:
A system, method, and computer program product are provided for redistributing multi-sample processing workloads between threads. A workload for a plurality of multi-sample pixels is received and each thread in a parallel thread group is associated with a corresponding multi-sample pixel of the plurality of pixels. The workload is redistributed between the threads in the parallel thread group based on a characteristic of the workload and the workload is processed by the parallel thread group. In one embodiment, the characteristic is rasterized coverage information for the plurality of multi-sample pixels.
Abstract:
A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets.
Abstract:
A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel.
Abstract:
Apparatuses, systems, and techniques to store one or more tensor maps in one or more cache storages. In at least one embodiment, a processor includes one or more tensor acceleration logic circuits to cause one or more tensor maps to be stored in one or more cache storages.
Abstract:
Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more GPU caches.
Abstract:
Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.
Abstract:
Apparatuses, systems, and techniques to generate a tensor mapping. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a mapping from a first tensor to a second tensor to be generated.
Abstract:
Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about a memory transaction corresponding to the translation. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about one or more memory transactions corresponding to the translation.