Abstract:
A processor of an aspect includes an on-die programmable architecturally-visible storage. The processor also includes a decode unit to receive a data access instruction of an instruction set of the processor. The data access instruction to indicate a data address that is to be associated with data to be stored in the on-die programmable architecturally-visible storage, to indicate a data size associated with the data to be stored in the on-die programmable architecturally-visible storage, and to indicate a destination storage location of the processor. An execution unit is coupled with the decode unit and the on-die programmable architecturally-visible storage. The execution unit is on-die with the on-die programmable storage. The execution unit is operable, in response to the data access instruction, to store the data, which is associated with the data address and the data size, in the destination storage location that is to be indicated by the instruction.
Abstract:
An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
Abstract:
In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
Abstract:
In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
Abstract:
In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.
Abstract:
Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
Abstract:
In one embodiment, A processor includes a logic to receive performance monitoring information from at least some of a plurality of cores and determine, according to a power management model, a performance state for one or more of the plurality of cores based on the performance monitoring information, and a second logic to receive the performance monitoring information and dynamically update the power management model according to a reinforcement learning process. Other embodiments are described and claimed.
Abstract:
Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.
Abstract:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.
Abstract:
Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed.