Abstract:
In one embodiment, a processor includes one or more cores including a cache memory hierarchy; a performance monitor coupled to the one or more cores, the performance monitor to monitor performance of the one or more cores, the performance monitor to calculate pipeline cost metadata based at least in part on count information associated with the cache memory hierarchy; and a power controller coupled to the performance monitor, the power controller to receive the pipeline cost metadata and determine a low power state for the one or more cores to enter based at least in part on the pipeline cost metadata. Other embodiments are described and claimed.
Abstract:
In one embodiment, a processor includes: a plurality of cores; a power controller including a logic to autonomously demote a first request for at least one core of the plurality of cores to enter a first low power state, to cause the at least one core to enter a second low power state, the first low power state a deeper low power state than the second low power state; and an interface to receive an input from a system software, the input including at least one demotion control parameter, where the logic is to autonomously demote the first request based at least in part on the at least one demotion control parameter. Other embodiments are described and claimed.
Abstract:
In one embodiment, a processor includes a plurality of cores and a power controller including a first logic, responsive to a determination that the processor resided in a forced idle state for less than a threshold duration, to update a first counter and, responsive to a value of the first counter that exceeds a control threshold, prevent the processor from entry into the forced idle state. Other embodiments are described and claimed.
Abstract:
A heterogeneous processor architecture is described. For example, a processor according to one embodiment of the invention comprises: a set of two or more small physical processor cores; at least one large physical processor core having relatively higher performance processing capabilities and relatively higher power usage relative to the small physical processor cores; virtual-to-physical (V-P) mapping logic to expose the set of two or more small physical processor cores to software through a corresponding set of virtual cores and to hide the at least one large physical processor core from the software.
Abstract:
A processor comprising: execution logic to execute a first thread including an accelerator invocation instruction to invoke an accelerator command; an accelerator to execute an accelerator thread in response to the accelerator command, the accelerator to store state data associated with the accelerator thread in a application memory area in memory, wherein prior to executing the accelerator thread, the accelerator is to lock entries in a translation lookaside buffer (TLB) associated with the accelerator thread to prevent an exception which might otherwise result.
Abstract:
A processor includes multiple physical cores that support multiple logical cores of different core types, where the core types include a big core type and a small core type. A multi-threaded application includes multiple software threads are concurrently executed by a first subset of logical cores in a first time slot. Based on data gathered from monitoring the execution in the first time slot, the processor selects a second subset of logical cores for concurrent execution of the software threads in a second time slot. Each logical core in the second subset has one of the core types that matches the characteristics of one of the software threads.
Abstract:
An apparatus is described having multiple cores, each core having: a) an accelerator; and, b) a general purpose CPU coupled to the accelerator. The general purpose CPU has functional unit logic circuitry to execute an instruction that returns an amount of storage space to store context information of the accelerator.
Abstract:
Some implementations disclosed herein provide techniques and arrangements for a synchronous software interface for a specialized logic engine. The synchronous software interface may receive, from a first core of a plurality of cores, a control block including a transaction for execution by the specialized logic engine. The synchronous software interface may send the control block to the specialized logic engine and wait to receive a confirmation from the specialized logic engine that the transaction was successfully executed.
Abstract:
Page faults arising in a graphics processing unit may be handled by an operating system running on the central processing unit. In some embodiments, this means that unpinned memory can be used for the graphics processing unit. Using unpinned memory in the graphics processing unit may expand the capabilities of the graphics processing unit in some cases.
Abstract:
Dynamic interrupt steering remaps the handling of interrupts away from processor units executing important workloads. During the operation of a computing system, important workload utilization rates for processor units handling interrupts are determined and those processor units with utilization rates about a threshold value are made unavailable for handling interrupts. Interrupts are dynamically remapped to processor units available for interrupt handling based on processor unit idle state and, in the case of heterogeneous computing systems, processor unit type. Processor units are capable of idle state demotion by, in response to receiving a request to enter into a deep idle state, determining if its interrupt handling rate is greater than a threshold value, and if so, placing itself into a shallower idle state than requested. This avoids the computing system from incurring the expensive idle state exit latency and power costs associated with exiting from a deep idle state.