摘要:
A method of assigning virtual memory to physical memory in a data processing system allocates a set of contiguous physical memory pages for a new page mapping, instructs the memory controller to move the virtual memory pages according to the new page mapping, and then allows access to the virtual memory pages using the new page mapping while the memory controller is still copying the virtual memory pages to the set of physical memory pages. The memory controller can use a mapping table which temporarily stores entries of the old and new page addresses, and releases the entries as copying for each entry is completed. The translation lookaside buffer (TLB) entries in the processor cores are updated for the new page addresses prior to completion of copying of the memory pages by the memory controller. The invention can be extended to non-uniform memory array (NUMA) systems. For systems with cache memory, any cache entry which is affected by the page move can be updated by modifying its address tag according to the new page mapping. This tag modification may be limited to cache entries in a dirty coherency state. The cache can further relocate a cache entry based on a changed congruence class for any modified address tag.
摘要:
A method and apparatus for managing cache injection in a multiprocessor system reduces processing time associated with direct memory access transfers in a symmetrical multiprocessor (SMP) or a non-uniform memory access (NUMA) multiprocessor environment. The method and apparatus either detect the target processor for DMA completion or direct processing of DMA completion to a particular processor, thereby enabling cache injection to a cache that is coupled with processor that executes the DMA completion routine processing the data injected into the cache. The target processor may be identified by determining the processor handling the interrupt that occurs on completion of the DMA transfer. Alternatively or in conjunction with target processor identification, an interrupt handler may queue a deferred procedure call to the target processor to process the transferred data. In NUMA multiprocessor systems, the completing processor/target memory is chosen for accessibility of the target memory to the processor and associated cache.
摘要:
A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
摘要:
A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.
摘要:
A method of assigning virtual memory to physical memory in a data processing system allocates a set of contiguous physical memory pages for a new page mapping, instructs the memory controller to move the virtual memory pages according to the new page mapping, and then allows access to the virtual memory pages using the new page mapping while the memory controller is still copying the virtual memory pages to the set of physical memory pages. The memory controller can use a mapping table which temporarily stores entries of the old and new page addresses, and releases the entries as copying for each entry is completed. The translation look aside buffer (TLB) entries in the processor cores are updated for the new page addresses prior to completion of copying of the memory pages by the memory controller. The invention can be extended to non-uniform memory array (NUMA) systems. For systems with cache memory, any cache entry which is affected by the page move can be updated by modifying its address tag according to the new page mapping. This tag modification may be limited to cache entries in a dirty coherency state. The cache can further relocate a cache entry based on a changed congruence class for any modified address tag.
摘要:
A method and memory controller for adaptive row management within a memory subsystem provides metrics for evaluating row access behavior and dynamically adjusting the row management policy of the memory subsystem in conformity with measured metrics to reduce the average latency of the memory subsystem. Counters provided within the memory controller track the number of consecutive row accesses and optionally the number of total accesses over a measurement interval. The number of counted consecutive row accesses can be used to control the closing of rows for subsequent accesses, reducing memory latency. The count may be validated using a second counter or storage for improved accuracy and alternatively the row close count may be set via program or logic control in conformity with a count of consecutive row hits in ratio with a total access count. The control of row closure may be performed by a mode selection between always closing a row (non-page mode) or always holding a row open (page mode) or by intelligently closing rows after a count interval (row hold count) determined from the consecutive row access measurements. The logic and counters may be incorporated within the memory controller or within the memory devices and the controller/memory devices may provide I/O ports or memory locations for reading the count values and/or setting a row management mode or row hold count.
摘要:
A multiprocessor system includes a plurality of data processing nodes. Each node has a processor coupled to a system memory, a cache memory, and a cache directory. The cache directory contains cache coherency information for a predetermined range of system memory addresses. An interconnection enables the nodes to exchange messages. A node initiating a function shipping request identifies an intermediate destination directory based on a list of the function's operands and sends a message indicating the function and its corresponding operands to the identified destination directory. The destination cache directory determines a target node based, at least in part, on its cache coherency status information to reduce memory access latency by selecting a target node where all or some of the operands are valid in the local cache memory. The destination directory then ships the function to the target node over the interconnection.
摘要:
An analysis and visualization depicts how an application is leveraging computer processor cores in time. The analysis and visualization enables a developer to readily identify the degree of concurrency exploited by an application at runtime. Information regarding processes or threads running on the processor cores over time is received, analyzed, and presented to indicate portions of processor cores that are used by the application, idle, or used by other processes in the system. The analysis and visualization can help a developer understand contention for processor resources, confirm the degree of concurrency, or identify serial regions of execution that might provide opportunities for exploiting parallelism.
摘要:
A data processing system includes a system memory and a cache hierarchy that caches contents of the system memory. According to one method of data processing, a storage modifying operation having a cacheable target real memory address is received. A determination is made whether or not the storage modifying operation has an associated bypass indication. In response to determining that the storage modifying operation has an associated bypass indication, the cache hierarchy is bypassed, and an update indicated by the storage modifying operation is performed in the system memory. In response to determining that the storage modifying operation does not have an associated bypass indication, the update indicated by the storage modifying operation is performed in the cache hierarchy.
摘要:
Methods and systems are disclosed for measuring performance event rates at a computer and reporting the performance event rates using timelines. A particular method tracks, for a time period, the occurrences of a particular event at a computer. Event rates corresponding to different time segments within the time period are calculated, and the time segments are assigned colors based on their associated event rates. The event rates are used to display a colored timeline for the time period, including displaying a colored timeline portion for each time segment in its associated color.