Abstract:
Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.
Abstract:
Methods and apparatus to dynamically insert prefetch instructions are disclosed. In an example method, one or more samples associated with cache misses are identified from a performance monitoring unit in a processor system. Based on sample information associated with the one or more samples, delinquent information is generated. To dynamically insert one or more prefetch instructions, a prefetch point is identified based on the delinquent information.
Abstract:
Methods and apparatus to dynamically insert prefetch instructions are disclosed. In an example method, one or more samples associated with cache misses are identified from a performance monitoring unit in a processor system. Based on sample information associated with the one or more samples, delinquent information is generated. To dynamically insert one or more prefetch instructions, a prefetch point is identified based on the delinquent information.
Abstract:
Methods and apparatus to insert prefetch instructions based on garbage collector analysis and compiler analysis are disclosed. In an example method, one or more batches of samples associated with cache misses from a performance monitoring unit in a processor system are received. One or more samples from the one or more batches of samples based on delinquent information are selected. A performance impact indicator associated with the one or more samples is generated. Based on the performance indicator, at least one of a garbage collector analysis and a compiler analysis is initiated to identify one or more delinquent paths. Based on the at least one of the garbage collector analysis and the compiler analysis, one or more prefetch points to insert prefetch instructions are identified.
Abstract:
Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.
Abstract:
Methods and apparatus to insert prefetch instructions based on garbage collector analysis and compiler analysis are disclosed. In an example method, one or more batches of samples associated with cache misses from a performance monitoring unit in a processor system are received. One or more samples from the one or more batches of samples based on delinquent information are selected. A performance impact indicator associated with the one or more samples is generated. Based on the performance indicator, at least one of a garbage collector analysis and a compiler analysis is initiated to identify one or more delinquent paths. Based on the at least one of the garbage collector analysis and the compiler analysis, one or more prefetch points to insert prefetch instructions are identified.