Abstract:
One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
Abstract:
Methods and apparatus relating to techniques for resource load balancing based on usage and/or power limits are described. In an embodiment, resource load balancing logic causes a first resource of a processor to operate at a first frequency and a second resource of the processor to operate at a second frequency. Memory stores a plurality of frequency values. The resource load balancing logic also selects the first frequency and the second frequency based on the stored plurality of frequency values. Operation of the first resource at the first frequency and the second resource at the second frequency in turn causes the processor to operate under a power budget. The resource load balancing logic causes change to the first frequency and the second frequency in response to a determination that operation of the processor is different than the power budget. Other embodiments are also disclosed and claimed.
Abstract:
A mechanism is described for facilitating storage management for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting one or more components associated with machine learning, where the one or more components include memory and a processor coupled to the memory, and where the processor includes a graphics processor. The method may further include allocating a storage portion of the memory and a hardware portion of the processor to a machine learning training set, where the storage and hardware portions are precise for implementation and processing of the training set.
Abstract:
Embodiments of the invention relate to a method and apparatus for a zero voltage processor sleep state. A processor may include a dedicated cache memory. A voltage regulator may be coupled to the processor to provide an operating voltage to the processor. During a transition to a zero voltage power management state for the processor, the operational voltage applied to the processor by the voltage regulator may be reduced to approximately zero and the state variables associated with the processor may be saved to the dedicated cache memory.
Abstract:
In an embodiment, a processor includes a core to execute instructions, an agent to perform an operation independently of the core, a fabric to couple the core and agent and including a plurality of domains and a logic to receive isochronous parameter information from the agent and environmental information of a platform and to generate first and second values, and a power controller to control a frequency of the domains based at least in part on the first and second values. Other embodiments are described and claimed.
Abstract:
Embodiments of the invention relate to a method and apparatus for a zero voltage processor sleep state. A processor may include a dedicated cache memory. A voltage regulator may be coupled to the processor to provide an operating voltage to the processor. During a transition to a zero voltage power management state for the processor, the operational voltage applied to the processor by the voltage regulator may be reduced to approximately zero and the state variables associated with the processor may be saved to the dedicated cache memory.
Abstract:
Embodiments of the invention relate to a method and apparatus for a zero voltage processor sleep state. A processor may include a dedicated cache memory. A voltage regulator may be coupled to the processor to provide an operating voltage to the processor. During a transition to a zero voltage power management state for the processor, the operational voltage applied to the processor by the voltage regulator may be reduced to approximately zero and the state variables associated with the processor may be saved to the dedicated cache memory.
Abstract:
With the progress toward multi-core processors, each core is can not readily ascertain the status of the other dies with respect to an idle or active status. A proposal for utilizing an interface to transmit core status among multiple cores in a multi-die microprocessor is discussed. Consequently, this facilitates thermal management by allowing an optimal setting for setting performance and frequency based on utilizing each core status.