摘要:
In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
摘要:
In an embodiment, at least one computer readable storage medium has instructions stored thereon for causing a system to send, from a processor to a task execution device, a first call to execute a first subroutine of a set of chained subroutines. The first subroutine may have a first subroutine output argument that includes a first token to indicate that first output data from execution of the first subroutine is intermediate data of the set of chained subroutines. The instructions are also for causing the system, responsive to inclusion of the first token in the first subroutine output argument, to enable the processor to execute one or more operations while the task execution device executes the first subroutine. Other embodiments are described and claimed.
摘要:
Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.
摘要:
In some embodiments, a data structure may be received in a first processing system. The data structure may represent a plurality of instructions for a second processing system. For at least one instruction of the plurality of instructions, a determination may be made as to whether the instruction can be replaced by a compact instruction for the second processing system. A compact instruction may be generated if the instruction can be replaced by a compact instruction. In some embodiments, an instruction may be received in a processing system. A determination may be made as to whether the instruction is a compact instruction. A decompacted instruction may be generated if the instruction is a compact instruction.
摘要:
Apparatus and methods for desynchronizing object-oriented software applications in managed runtime environments are disclosed. Apparatus and methods for desynchronizing synchronized program code determine a type of the program code during just-in-time compilation of the program code and modify the program code during just-in-time compilation of the program code based on the type of the program code to desynchronize the program code.
摘要:
In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.
摘要:
An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.
摘要:
In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.