摘要:
A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
摘要:
A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
摘要:
A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
摘要:
A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
摘要:
A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
摘要:
Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system
摘要:
Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system
摘要:
A prefetching scheme to detect when a load misses the lower level cache and hits the next level cache. Consequently, the prefetching scheme utilizes the previous information for the cache miss to the lower level cache and hit to the next higher level of cache memory that may result in initiating a sidedoor prefetch load for fetching the previous or next cache line into the lower level cache. In order to generate an address for the sidedoor prefetch, a history of cache access is maintained in a queue.
摘要:
A method and apparatus for efficiently handling texture sampling is described herein. A compiler or other software is capable of breaking a texture sampling operation for a pixel into a pre-fetch operation and a use operation. A processing element, in response to executing the pre-fetch operation, delegates computation of the texture sample of the pixel to a hardware texture sample unit. In parallel to the hardware texture sample unit performing a texture sample for the pixel and providing the result, i.e. a textured pixel (texel), to a destination address, the processing element is capable of executing other independent code. After an amount of time, the processing element executes the use operation, such as a load operation to load the texel from the destination address.
摘要:
In an embodiment, a processor includes a decode logic to receive and decode a first memory access instruction to store data in a cache memory with a replacement state indicator of a first level, and to send the decoded first memory access instruction to a control logic. In turn, the control logic is to store the data in a first way of a first set of the cache memory and to store the replacement state indicator of the first level in a metadata field of the first way responsive to the decoded first memory access instruction. Other embodiments are described and claimed.