Abstract:
Techniques and mechanisms for determining a latency event to be represented in performance monitoring information. In an embodiment, circuit blocks of a pipeline experience respective latency events at variously times during tasks by the pipeline which service a workload. The circuit blocks send to an evaluation circuit of the pipeline respective event signals which each indicate whether a respective latency event has been detected. The event signals are communicated in parallel with at least a portion of the pipeline. In response to a trigger event in the pipeline, the evaluation circuit selects an event signal, based on relative priorities of the event signals, which provides a sample indicating a detected latency event. Based on the selected event signal, a representation of the indicated latency event in provided to latency event count or other value performance monitoring information. In another embodiment, different time delays are applied to various event signals.
Abstract:
Systems, methods, and apparatuses relating to circuitry to implement out-of-order access to a shared microcode sequencer by a clustered decode pipeline are described. In one embodiment, a hardware processor core includes a first decode cluster comprising a plurality of decoder circuits, a second decode cluster comprising a plurality of decoder circuits, a fetch circuit to fetch a first block of instructions and send the first block of instructions to the first decode cluster for decoding, and fetch a second block of instructions younger in program order than the first block of instructions and send the second block of instructions to the second decode cluster for decoding, a microcode sequencer comprising a memory that stores a plurality of micro-operations, and an arbitration circuit to arbitrate access by the first decode cluster and the second decode cluster to a shared read port of the memory, wherein the arbitration circuit is to allow the second decode cluster decoding the second block of instructions access to the shared read port of the memory instead of the first decode cluster decoding the first block of instructions when an instruction of the second block of instructions has a number of corresponding micro-operations in the microcode sequencer below an arbitration threshold.
Abstract:
In one embodiment, an apparatus includes: at least one core to execute instructions; and a plurality of fixed counters coupled to the at least one core, the plurality of fixed counters to count events during execution on the at least one core, at least some of the plurality of fixed counters to count event information of a highest level of a hierarchical performance monitoring organization. Other embodiments are described and claimed.
Abstract:
Systems, methods, and apparatuses relating to circuitry to implement toggle point insertion for a clustered decode pipeline are described. In one example, a hardware processor core includes a first decode cluster comprising a plurality of decoder circuits, a second decode cluster comprising a plurality of decoder circuits, and a toggle point control circuit to toggle between sending instructions requested for decoding between the first decode cluster and the second decode cluster, wherein the toggle point control circuit is to: determine a location in an instruction stream as a candidate toggle point to switch the sending of the instructions requested for decoding between the first decode cluster and the second decode cluster, track a number of times a characteristic of multiple previous decodes of the instruction stream is present for the location, and cause insertion of a toggle point at the location, based on the number of times, to switch the sending of the instructions requested for decoding between the first decode cluster and the second decode cluster.
Abstract:
Techniques and mechanisms for determining a latency event to be represented in performance monitoring information. In an embodiment, circuit blocks of a pipeline experience respective latency events at variously times during tasks by the pipeline which service a workload. The circuit blocks send to an evaluation circuit of the pipeline respective event signals which each indicate whether a respective latency event has been detected. The event signals are communicated in parallel with at least a portion of the pipeline. In response to a trigger event in the pipeline, the evaluation circuit selects an event signal, based on relative priorities of the event signals, which provides a sample indicating a detected latency event. Based on the selected event signal, a representation of the indicated latency event in provided to latency event count or other value performance monitoring information. In another embodiment, different time delays are applied to various event signals.
Abstract:
An embodiment of an integrated circuit may comprise a core and an instruction decoder communicatively coupled to the core to decode one or more instructions for execution by the core, where the instruction decoder includes two or more decode clusters in a parallel arrangement, and circuitry to offline a decode cluster of the two or more decode clusters. Other embodiments are disclosed and claimed.
Abstract:
Techniques and mechanisms for determining a latency event to be represented in performance monitoring information. In an embodiment, circuit blocks of a pipeline experience respective latency events at variously times during tasks by the pipeline which service a workload. The circuit blocks send to an evaluation circuit of the pipeline respective event signals which each indicate whether a respective latency event has been detected. The event signals are communicated in parallel with at least a portion of the pipeline. In response to a trigger event in the pipeline, the evaluation circuit selects an event signal, based on relative priorities of the event signals, which provides a sample indicating a detected latency event. Based on the selected event signal, a representation of the indicated latency event in provided to latency event count or other value performance monitoring information. In another embodiment, different time delays are applied to various event signals.
Abstract:
A method and apparatus for disabling ways of a cache memory in response to history based usage patterns is herein described. Way predicting logic is to keep track of cache accesses to the ways and determine if an access to some ways are to be disabled to save power, based upon way power signals having a logical state representing a predicted miss to the way. One or more counters associated with the ways count accesses, wherein a power signal is set to the logical state representing a predicted miss when one of said one or more counters reaches a saturation value. Control logic adjusts said one or more counters associated with the ways according to the accesses.
Abstract:
In one embodiment, an apparatus comprises: a branch prediction circuit to predict whether a branch is to be taken; a fetch circuit, in a single fetch cycle, to send a first portion of a fetch region of instructions to a first decode cluster and send a second portion of the fetch region to the second decode cluster; the first decode cluster comprising a first plurality of decode circuits to decode one or more instructions in the first portion of the fetch region; and the second decode cluster comprising a second plurality of decode circuits to decode one or more other instructions in the second portion of the fetch region. Other embodiments are described and claimed.
Abstract:
Techniques and mechanisms for providing branch prediction information to facilitate instruction decoding by a processor. In an embodiment, entries of a branch prediction table (BTB) each identify, for a corresponding instruction, whether a prediction based on the instruction (if any) is eligible to be communicated, with another prediction, in a single fetch cycle. A branch prediction unit of the processor determines a linear address of a fetch region which is under consideration, and performs a search of the BTB based on the linear address. A result of the search is evaluated to detect for any hit entry which indicates a double prediction eligibility. In another embodiment, where it is determined that double prediction eligibility is indicated for an earliest one the instructions represented by the hit entries, multiple predictions are communicated in a single fetch cycle.