摘要:
In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
摘要:
A method and apparatus to facilitate shared pointers in a heterogeneous platform. In one embodiment of the invention, the heterogeneous or non-homogeneous platform includes, but is not limited to, a central processing core or unit, a graphics processing core or unit, a digital signal processor, an interface module, and any other form of processing cores. The heterogeneous platform has logic to facilitate sharing of pointers to a location of a memory shared by the CPU and the GPU. By sharing pointers in the heterogeneous platform, the data or information sharing between different cores in the heterogeneous platform can be simplified.
摘要:
Embodiments of a hardware accelerator having a circuit configurable to perform a plurality of matrix operations and Fast Fourier Transforms (FFT) are presented herein.
摘要:
A system and method for prioritizing the transmission of packets in a wireless local area network. A station selects a packet from local priority queuing and identifies the priority bits of the packet. The station declares the priority of the selected packet based on the binary value of the priority bits. If the station detects that another station has selected a packet with a higher priority, then the station ceases to contend for transmission during the current transmission cycle.
摘要:
Methods and apparatuses to reduce the number of IO requests to memory when executing a program that iteratively processes contiguous data are provided. A first set of data elements may be loaded in a first register and a second set of data elements may be loaded in a second register. The first set of data elements and the second set of data elements can be used during the execution of a program to iteratively process the data elements. For each of a plurality of iterations, a corresponding set of data elements to be used during the execution of an operation for the iteration may be selected from the first set of data elements stored in the first register and the second set of data elements stored in the second register. In this way, the same data elements are not re-loaded from memory during each iteration.
摘要:
A method and apparatus to facilitate shared pointers in a heterogeneous platform. In one embodiment of the invention, the heterogeneous or non-homogeneous platform includes, but is not limited to, a central processing core or unit, a graphics processing core or unit, a digital signal processor, an interface module, and any other form of processing cores. The heterogeneous platform has logic to facilitate sharing of pointers to a location of a memory shared by the CPU and the GPU. By sharing pointers in the heterogeneous platform, the data or information sharing between different cores in the heterogeneous platform can be simplified.
摘要:
A single instruction multiple data processor may accomplish register allocation by identifying live ranges that have incompatible write masks during compilation. Then, edges are added in an interference graph between live ranges that have incompatible masks so that those live ranges will not be assigned to the same physical register.
摘要:
A single instruction multiple data processor may accomplish register allocation by identifying live ranges that have incompatible write masks during compilation. Then, edges are added in an interference graph between live ranges that have incompatible masks so that those live ranges will not be assigned to the same physical register.
摘要:
Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, the application includes shader code. In some implementations, each of the multiple atomic operations increment a value stored at the same memory by an update amount. The translation unit may calculate the partial prefix sum over all the atomic operations and replace the multiple atomic operations with a single atomic operation to increment the value stored at memory by the sum of the update amounts.
摘要:
Methods and systems may provide for identifying a plurality of subject commands that reference a common screen location and access a read/write resource, and serializing the plurality of subject commands according to a predefined order. Additionally, execution of the plurality of subject commands may be deferred until one or more additional commands referencing the common screen location are executed. In one example, the plurality of subject commands are serialized in response to a serialization command.