摘要:
A method and system for analyzing cycles per instruction (CPI) performance in a processor. A completion table corresponds to the instructions in a group to be processed by the processor. An empty completion table indicates that there has been some type of catastrophe that caused a table flush. While the table is empty, a performance monitoring counter (PMC), located in a performance monitoring unit (PMU) in the processor, counts the number of clock cycles that the table is empty. Preferably, a separate PMC is utilized depending on the reason that the completion table is empty. A second PMC likewise counts the number of clock cycles spent re-filling the empty completion table. A third PMC counts the number of clock cycles spent actually executing the instructions in the completion table. The information in the PMC's can be used to evaluate the true cause for degradation of CPI performance.
摘要:
A shared memory multiprocessor is provided which includes a plurality of nodes connected to one another. Each node includes: a main memory for storing data; a cache memory for storing a copy of data obtained from the main memory; and a CPU for accessing the main memory and the cache memory and processing data. The node further includes a directory and a memory region group. The directory is made up of directory entries each including one or more directory bits which each indicate whether the cache memory of another node stores a copy of a part of a memory region group of the main memory of this node. The memory region group includes of memory regions having the same memory address portion including a cache index portion. Each node is assigned to one of the one or more directory bits.
摘要:
A directory of each node in a shared memory multiprocessor is made up of directory entries each including one or more directory bits indicating whether the cache memory of another node stores a copy of a part of a memory region group of the main memory of one node. The memory region group includes memory regions having the same memory address portion including a cache index portion. Each node is assigned one of the directory bits. When accessing the main memory, the node checks whether the directory bits of the directory entry corresponding to a memory region to be accessed are set to a predetermined value, and if one or more of the directory bits of the directory entry are set to the predetermined value, an access address is multicast or broadcast to other nodes to perform coherency control.
摘要:
A method for determining the latency for a particular level of memory within a hierarchical memory system is disclosed. A performance monitor counter is allocated to count the number of loads (load counter) and for counting the number of cycles (cycle counter). The method begins with a processor determining which load to select for measurement. In response to the determination, the cycle counter value is stored in a rewind register. The processor issues the load and begins counting cycles. In response to the load completing, the level of memory for the load is determined. If the load was executed from the desired memory level, the load counter is incremented. Otherwise, the cycle counter is rewound to its previous value.
摘要:
A method and system for identifying instruction completion delays for a group of instructions in a computer processor. Each instruction in the group of instructions has a status indicator that identifies what is preventing that instruction from completing execution. Examples of completion delays are cache misses, data dependencies or simply the time required for an execution unit in the computer processor to process the instruction. As each instruction finishes executing, its associated status indicator is cleared to indicate that the instruction is no longer waiting to execute. The last instruction to execute is the instruction that is holding up completion of the entire group, and thus the cause for the completion delay of the last instruction is recorded as the cause of completion delay for the entire group.
摘要:
A circuit and method for maintaining a correct value in performance monitor counter within a speculative computer microprocessor is disclosed. In response to determining the begin of speculative execution within the microprocessor, the value of the performance monitor counter is stored in a rewind register. The performance monitor counter is incremented in response to predetermined events. If the microprocessor determines the speculative execution was incorrect, the value of the rewind register is loaded into the counter, restoring correct value for the counter.