摘要:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type
摘要:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
摘要:
Briefly, in accordance with one or more embodiments, an apparatus comprises a processor to monitor cache utilization of an application during execution of the application for a workload; and a memory to store cache utilization statistics responsive to the monitored cache utilization. The processor is to determine an optimal cache configuration for the application based at least in part on the cache utilization statistics for the workload such that a smallest amount of cache is turned on for subsequent executions of the workload by the application.
摘要:
In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
摘要:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes sorting logic to sort processing threads into thread groups based on bit depth of floating point thread operations.
摘要:
In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
摘要:
Briefly, in accordance with one or more embodiments, an apparatus comprises a processor to render data from an application to be displayed on a display panel, and a memory to store the compressed final display surface writes. The processor is to compress final display surface writes of the data to be displayed on the display panel in a format to be displayed on the display to allow a display engine coupled to the display to stream the compressed final display surface writes to the display.
摘要:
Briefly, in accordance with one or more embodiments, a processor receives an incoming data stream that includes alpha channel data, and a memory stores an application programming interface (API). The API is to route the alpha channel data to a fixed point blending unit to perform one or more blending operations using fixed point representation of the alpha channel data. The API is further to route the incoming data stream to a floating point blending unit to perform operations involving floating point representation of the incoming data.
摘要:
Briefly, in accordance with one or more embodiments, a processor receives an incoming data stream that includes alpha channel data, and a memory stores an application programming interface (API). The API is to route the alpha channel data to a fixed point blending unit to perform one or more blending operations using fixed point representation of the alpha channel data. The API is further to route the incoming data stream to a floating point blending unit to perform operations involving floating point representation of the incoming data.
摘要:
Methods and apparatus relating to a current change mitigation policy for limiting voltage droop in graphics logic are described. In an embodiment, logic inserts one or more bubbles in one or more Execution Unit (EU) logic pipelines or one or more sampler logic pipelines of a processor. The bubbles at least temporarily reduce execution of operations in one or more subsystems of the processor based at least partially on a comparison of a first value and one or more clamping threshold values. The first value is determined based at least partially on a summation of products of one or more event counts and dynamic capacitance weights for one or more subsystems of the processor. Other embodiments are also disclosed and claimed.