Abstract:
A processor includes a core to execute a transaction with a memory via cache; and cache controller having an index mapper circuit to: identify a physical memory address associated with the transaction and having a plurality of bits; determine, based on the plurality of bits, a first set of bits encoding a tag value, a second set of bits encoding a page index value, and a third set of bits encoding a line index value; determine a mapping function corresponding to the tag value; determine, using the mapping function, a bit-placement order; combine, based on the order, second and third set of bits to form an index; generate, using the index, a mapping from the address to a cache line index value identifying a cache line in the cache; and wherein the cache controller is further to access, using the mapping and in response to the transaction, the cache line.
Abstract:
In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.
Abstract:
In an embodiment, a processor includes one or more cores including a first core operable at an operating voltage between a minimum operating voltage and a maximum operating voltage. The processor also includes a power control unit including first logic to enable coupling of ancillary logic to the first core responsive to the operating voltage being less than or equal to a threshold voltage, and to disable the coupling of the ancillary logic to the first core responsive to the operating voltage being greater than the threshold voltage. Other embodiments are described and claimed.
Abstract:
In an embodiment, a processor includes one or more cores including a first core operable at an operating voltage between a minimum operating voltage and a maximum operating voltage. The processor also includes a power control unit including first logic to enable coupling of ancillary logic to the first core responsive to the operating voltage being less than or equal to a threshold voltage, and to disable the coupling of the ancillary logic to the first core responsive to the operating voltage being greater than the threshold voltage. Other embodiments are described and claimed.
Abstract:
An apparatus and method to reduce bandwidth and latency associated with probabilistic caches. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions and process data, one or more of the cores to generate a request for a first cache line; a cache controller comprising cache lookup logic to determine a first way of a cache in which to search for the first cache line based on a first set of tag bits comprising one or more bits associated with the first cache line; the cache lookup logic to compare a second set of tag bits of the first cache line with a third set of tag bits of an existing cache line stored in the first way, wherein if the second set of tag bits and the third set of tag bits to not match, then the cache lookup logic to determine that the first cache line is not in the first way and to compare a fourth set of tag bits of the first cache line with a fifth set of tag bits of the existing cache line, wherein responsive to a match between the fourth set of tag bits and the fifth set of tag bits, the cache lookup logic to determine that the first cache line is stored in a second way and to responsively read the first cache line from the second way.
Abstract:
A processor includes a core to execute a transaction with a memory via cache; and cache controller having an index mapper circuit to: identify a physical memory address associated with the transaction and having a plurality of bits; determine, based on the plurality of bits, a first set of bits encoding a tag value, a second set of bits encoding a page index value, and a third set of bits encoding a line index value; determine a mapping function corresponding to the tag value; determine, using the mapping function, a bit-placement order; combine, based on the order, second and third set of bits to form an index; generate, using the index, a mapping from the address to a cache line index value identifying a cache line in the cache; and wherein the cache controller is further to access, using the mapping and in response to the transaction, the cache line.
Abstract:
An embodiment of a semiconductor package apparatus may include technology to identify a nested loop in a set of executable instructions, and determine at runtime if the nested loop is a candidate for cache blocking. Other embodiments are disclosed and claimed.
Abstract:
A mechanism for tracking the control flow of instructions in an application and performing one or more optimizations of a processing device, based on the control flow of the instructions in the application, is disclosed. Control flow data is generated to indicate the control flow of blocks of instructions in the application. The control flow data may include annotations that indicate whether optimizations may be performed for different blocks of instructions. The control flow data may also be used to track the execution of the instructions to determine whether an instruction in a block of instructions is assigned to a thread, a process, and/or an execution core of a processor, and to determine whether errors have occurred during the execution of the instructions.
Abstract:
Described herein are mechanisms for continuous automatic tuning of code regions for optimal hardware configurations for the code regions. One mechanism automatically tunes the tunable parameters for a demarcated code region by calculating metrics while executing the code region with different sets of tunable parameters and selecting one of the different sets based on the calculated metrics.
Abstract:
Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.