摘要:
A mechanism and functionality are provided for generating and using compiler optimized function variants. These variants may be used, for example, in situations where return values of functions called by code are not thereafter used by the code calling the functions. In particular, for a function called by computer code, at least two variants for the function may be generated. A function call, for calling the function, within original computer code may be analyzed to determine which variant of the at least two variants to use for the function call. The function call may be modified in the original computer code, to generate modified computer code, based on results of the analysis identifying which variant of the at least two variants to use for the function call.
摘要:
A mechanism and functionality are provided for generating and using compiler optimized function variants. These variants may be used, for example, in situations where return values of functions called by code are not thereafter used by the code calling the functions. In particular, for a function called by computer code, at least two variants for the function may be generated. A function call, for calling the function, within original computer code may be analyzed to determine which variant of the at least two variants to use for the function call. The function call may be modified in the original computer code, to generate modified computer code, based on results of the analysis identifying which variant of the at least two variants to use for the function call.
摘要:
A partitioned NUMA machine is managed to dynamically transform its partition layout state based on NUMA considerations. The NUMA machine includes two or more NUMA nodes that are operatively interconnected by one or more internodal communication links. Each node includes one or more CPUs and associated memory circuitry. Two or more logical partitions each comprise at a CPU and memory circuit allocation on at least one NUMA node. Each partition respectively runs at least one associated data processing application. The partitions are dynamically managed at runtime to transform the distributed data processing machine from a first partition layout state to a second partition layout state that is optimized for the data processing applications according to whether a given partition will most efficiently execute within a single NUMA node or by spanning across a node boundary. The optimization is based on access latency and bandwidth in the NUMA machine.
摘要:
A partitioned NUMA machine is managed to dynamically transform its partition layout state based on NUMA considerations. The NUMA machine includes two or more NUMA nodes that are operatively interconnected by one or more internodal communication links. Each node includes one or more CPUs and associated memory circuitry. Two or more logical partitions each comprise at a CPU and memory circuit allocation on at least one NUMA node. Each partition respectively runs at least one associated data processing application. The partitions are dynamically managed at runtime to transform the distributed data processing machine from a first partition layout state to a second partition layout state that is optimized for the data processing applications according to whether a given partition will most efficiently execute within a single NUMA node or by spanning across a node boundary. The optimization is based on access latency and bandwidth in the NUMA machine.
摘要:
A mechanism is provided for scheduling tasks across multiple processor units of differing capacity. In a multiple processor unit system with processor units of disparate speeds, it is advantageous to have the most processing-intensive tasks run on the processor units with the highest capacity. All tasks are initially scheduled on the lowest capacity processor units. Because processor units with higher capacity are more likely to have idle time, these higher capacity processor units may pull one or more tasks onto themselves from the same or lower capacity processor units. A processor unit will attempt to pull tasks that utilize a larger percentage of the timeslice. When a higher capacity processor unit is overloaded or near capacity, the higher capacity processor unit may push tasks to processor units with the same or lower capacity. A processor unit will attempt to push tasks that utilize a smaller percentage of the timeslice.
摘要:
Methods, apparatus, and products for determining thermal characteristics of instruction sets comprising one or more computer program instructions executed by a computer processor are disclosed that include tracking, in a performance counter, a number of classes of instructions run during execution of a plurality of instruction sets; identifying, for each instruction set, from the performance counter, a number of each class of instructions run during execution of the instruction set; and ranking the instruction sets in dependence upon the number of each class of instructions run during execution of each instruction set and a profile of thermal characteristics of classes of instructions.
摘要:
A mechanism is provided for scheduling tasks across multiple processor units of differing capacity. In a multiple processor unit system with processor units of disparate speeds, it is advantageous to have the most processing-intensive tasks run on the processor units with the highest capacity. All tasks are initially scheduled on the lowest capacity processor units. Because processor units with higher capacity are more likely to have idle time, these higher capacity processor units may pull one or more tasks onto themselves from the same or lower capacity processor units. A processor unit will attempt to pull tasks that utilize a larger percentage of the timeslice. When a higher capacity processor unit is overloaded or near capacity, the higher capacity processor unit may push tasks to processor units with the same or lower capacity. A processor unit will attempt to push tasks that utilize a smaller percentage of the timeslice.
摘要:
Methods, apparatus, and products for determining thermal characteristics of instruction sets comprising one or more computer program instructions executed by a computer processor are disclosed that include tracking, in a performance counter, a number of classes of instructions run during execution of a plurality of instruction sets; identifying, for each instruction set, from the performance counter, a number of each class of instructions run during execution of the instruction set; and ranking the instruction sets in dependence upon the number of each class of instructions run during execution of each instruction set and a profile of thermal characteristics of classes of instructions.
摘要:
Methods and arrangements of assigning tasks to processors are discussed. Embodiments include transformations, code, state machines or other logic to detect an attempt to execute an instruction of a task on a processor not supporting the instruction (non-supporting processor). The method may involve selecting a processor supporting the instruction (supporting physical processor). In many embodiments, the method may include storing data about the attempt to execute the instruction and, based upon the data, making another assignment of the task to a physical processor supporting the instruction. In some embodiments, the method may include representing the instruction set of a virtual processor as the union of the instruction sets of the physical processors comprising the virtual processor and assigning a task to the virtual processor based upon the representing.
摘要:
Methods and arrangements of assigning tasks to processors are discussed. Embodiments include transformations, code, state machines or other logic to detect an attempt to execute an instruction of a task on a processor not supporting the instruction (non-supporting processor). The method may involve selecting a processor supporting the instruction (supporting physical processor). In many embodiments, the method may include storing data about the attempt to execute the instruction and, based upon the data, making another assignment of the task to a physical processor supporting the instruction. In some embodiments, the method may include representing the instruction set of a virtual processor as the union of the instruction sets of the physical processors comprising the virtual processor and assigning a task to the virtual processor based upon the representing.