摘要:
In one embodiment, a method comprises communicating with one or more other nodes in a system from a first node in the system in response to a trap experienced by a processor in the first node during a memory operation, wherein the trap is signalled in the processor in response to one or more permission bits stored with a cache line in a cache accessible during performance of the memory operation; determining that the cache line is part of a memory transaction in a second node that is one of the other nodes, wherein a memory transaction comprises two or more memory operations that appear to execute atomically in isolation; and resolving a conflict between the memory operation and the memory transaction.
摘要:
In one embodiment, a processor comprises a coherence trap unit and a trap logic coupled to the coherence trap unit. The coherence trap unit is also coupled to receive data accessed in response to the processor executing a memory operation. The coherence trap unit is configured to detect that the data matches a designated value indicating that a coherence trap is to be initiated to coherently perform the memory operation. The trap logic is configured to trap to a designated software routine responsive to the coherence trap unit detecting the designated value. In some embodiments, a cache tag in a cache may track whether or not the corresponding cache line has the designated value, and the cache tag may be used to trigger a trap in response to an access to the corresponding cache line.
摘要:
In one embodiment, a memory controller for a node in a multi-node computer system comprises logic and a control unit. The logic is configured to determine if an address corresponding to a request received by the memory controller on an intranode interconnect is a remote address or a local address. A first portion of the memory in the node is allocated to store copies of remote data and a remaining portion stores local data. The control unit is configured to write writeback data to a location in the first portion. The writeback data corresponds to a writeback request from the intranode interconnect that has an associated remote address detected by the logic. The control unit is configured to determine the location responsive to the associated remote address and one or more indicators that identify the first portion in the memory.
摘要:
A system, method, and computer program product that captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted. Reuse distance for one memory operation may be measured as the number of memory operations that have been performed since the memory object it accesses was last accessed. Separate call stacks leading up to the same memory operation are identified and statistics are separated for the different call stacks. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations. Methods for assessing cache utilization as well as parallel execution are covered. The method includes modeling multithreaded memory systems and detecting false sharing coherence misses.
摘要:
A system for, method of and computer program product captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations.
摘要:
A system for, method of and computer program product captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations. Methods for assessing cache utilization as well as parallel execution are covered.
摘要:
A system for, method of and computer program product captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations. Methods for assessing cache utilization as well as parallel execution are covered.
摘要:
A system for, method of and computer program product captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations.
摘要:
A system for, method of and computer program product captures performance-characteristic data from the execution of a program and models system performance based on that data. Performance-characterization data based on easily captured reuse distance metrics is targeted, defined as the total number of memory references between two accesses to the same piece of data. Methods for efficiently capturing this kind of metrics are described. These data can be refined into easily interpreted performance metrics, such as performance data related to caches with LRU replacement and random replacement strategies in combination with fully associative as well as limited associativity cache organizations. Methods for assessing cache utilization as well as parallel execution are covered.
摘要:
In one embodiment, a node for a multi-node computer system comprises a coherence directory configured to store coherence states for coherence units in a local memory of the node and a coherence controller configured to receive a coherence request for a requested coherence unit. The requested coherence unit is included in a memory region that includes at least two coherence units, and the coherence controller is configured to read coherence states corresponding to two or more coherence units from the coherence directory responsive to the coherence request. The two or more coherence units are included in a previously-accessed memory region, and the coherence controller is configured to provide the requested coherence unit with a predicted coherence state responsive to the coherence states in the previously accessed memory region.