Abstract:
Closed loop performance controllers of asymmetric multiprocessor systems may be configured and operated to improve performance and power efficiency of such systems by adjusting control effort parameters that determine the dynamic voltage and frequency state of the processors and coprocessors of the system in response to the workload. One example of such an arrangement includes applying hysteresis to the control effort parameter and/or seeding the control effort parameter so that the processor or coprocessor receives a returning workload in a higher performance state. Another example of such an arrangement includes deadline driven control, in which the control effort parameter for one or more processing agents may be increased in response to deadlines not being met for a workload and/or decreased in response to deadlines being met too far in advance. The performance increase/decrease may be determined by comparison of various performance metrics for each of the processing agents.
Abstract:
A processing system can include a plurality of processing clusters. Each processing cluster can include a plurality of processor cores and a last level cache. Each processor core can include one or more dedicated caches and a plurality of counters. The plurality of counters may be configured to count different types of cache fills. The plurality of counters may be configured to count different types of cache fills, including at least one counter configured to count total cache fills and at least one counter configured to count off-cluster cache fills. Off-cluster cache fills can include at least one of cross-cluster cache fills and cache fills from system memory. The processing system can further include one or more controllers configured to control performance of one or more of the clusters, the processor cores, the fabric, and the memory responsive to cache fill metrics derived from the plurality of counters.
Abstract:
In an embodiment, an electronic device includes a package power zone controller. The device monitors the overall power consumption of multiple components of a “package.” The package power zone controller may detect workloads in which the package components (e.g. different types of processors, peripheral hardware, etc.) are each consuming relatively low levels of power, but the overall power consumption is greater than a desired target. The package power zone controller may implement various mechanisms to reduce power consumption in such cases.
Abstract:
Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.
Abstract:
Embodiments provide for a computer implemented method comprising sampling one or more power and performance metrics of a processor; determining an energy cost per instruction based on the one or more power and performance metrics; determining an efficiency metric based on the energy cost per instruction; computing an efficiency control error based on a difference between a current efficiency metric and a target efficiency metric; setting an efficiency control effort based on the efficiency control error; determining a performance control effort, the performance control effort determined by a performance controller for the processor; and adjusting the performance control effort based on the efficiency control effort, wherein adjusting the performance control effort reduces power consumption of the processor.
Abstract:
Systems are provided that support millimeter-wave wireless communications between hosts and electronic devices. A host may be formed using a personal computer associated with a user or computing equipment associated with a public establishment. Content can be automatically synchronized between the host and the user's electronic device over a millimeter-wave wireless communications link in a communications band such as a 60 GHz wireless communications band. Synchronization operations may be performed based on user content preferences. Content preference information may be gathered explicitly from a user using on-screen options or may be gathered by monitoring user media playback activities and media rating activities. The content preference information may be transmitted automatically from an electronic device to a host when the electronic device is brought within range of the host. Synchronization operations may be performed automatically when a user is in proximity of a point-of-sale terminal or ticketing equipment.
Abstract:
A device implementing adaptive memory performance control by thread group may include a memory and at least one processor. The at least one processor may be configured to execute a group of threads on one or more cores. The at least one processor may be configured to monitor a plurality of metrics corresponding to the group of threads executing on one or more cores. The metrics may include, for example, a core stall ratio and/or a power metric. The at least one processor may be configured to determine, based at least in part on the plurality of metrics, a memory bandwidth constraint with respect to the group of threads executing on the one or more cores. The at least one processor may be configured to, in response to determining the memory bandwidth constraint, increase a memory performance corresponding to the group of threads executing on the one or more cores.
Abstract:
The invention provides a technique for targeted scaling of the voltage and/or frequency of a processor included in a computing device. One embodiment involves scaling the voltage/frequency of the processor based on the number of frames per second being input to a frame buffer in order to reduce or eliminate choppiness in animations shown on a display of the computing device. Another embodiment of the invention involves scaling the voltage/frequency of the processor based on a utilization rate of the GPU in order to reduce or eliminate any bottleneck caused by slow issuance of instructions from the CPU to the GPU. Yet another embodiment of the invention involves scaling the voltage/frequency of the CPU based on specific types of instructions being executed by the CPU. Further embodiments include scaling the voltage and/or frequency of a CPU when the CPU executes workloads that have characteristics of traditional desktop/laptop computer applications.
Abstract:
Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.
Abstract:
A processing system can include a plurality of processing clusters. Each processing cluster can include a plurality of processor cores and a last level cache. Each processor core can include one or more dedicated caches and a plurality of counters. The plurality of counters may be configured to count different types of cache fills. The plurality of counters may be configured to count different types of cache fills, including at least one counter configured to count total cache fills and at least one counter configured to count off-cluster cache fills. Off-cluster cache fills can include at least one of cross-cluster cache fills and cache fills from system memory. The processing system can further include one or more controllers configured to control performance of one or more of the clusters, the processor cores, the fabric, and the memory responsive to cache fill metrics derived from the plurality of counters.