摘要:
A data processor (40) keeps track of misses to a cache (71) so that multiple misses within the same cache line can be merged or folded at reload time. A load/store unit (60) includes a completed store queue (61) for presenting store requests to the cache (71) in order. If a store request misses in the cache (71), the completed store queue (61) requests the cache line from a lower-level memory system (90) and thereafter inactivates the store request. When a reload cache line is received, the completed store queue (61) compares the reload address to all entries. If at least one address matches the reload address, one entry's data is merged with the cache line prior to storage in the cache (71). Other matching entries become active and are allowed to reaccess the cache (71). A miss queue (80) coupled between the load/store unit (60) and the lower-level memory system (90) implements reload folding to improve efficiency.
摘要:
The present invention discloses a method and apparatus that uses extensions to the TLB entry to dynamically identify pages of memory that can be weakly ordered or must be strongly ordered and enforces the appropriate memory model on those pages of memory. Such identification and memory model enforcement allows for more efficient execution of memory instructions in a hierarchical memory design in cases where memory instructions can be executed out of order. From the page table, the memory manager constructs TLB entries that associate page frame numbers of memory operands with page-granular client usage data and a memory order tag. The memory order tag identifies the memory model that is currently being enforced for the associated page of memory. The memory manager updates the memory order tag of the TLB entry in accordance with changes in the client usage information. In the preferred embodiment, the TLB structure is a global TLB shared by all processors. In alternative embodiments, the TLB structure may comprise either multiple distributed TLBs with shared knowledge, each assigned to a different processor, or a combination of multiple local TLBs, each assigned to a different processor, that exchange information with a global TLB, which in turn provides data to the memory manager to access the hierarchical memory system.
摘要:
The present invention discloses a method and apparatus that uses extensions to the TLB entry to dynamically identify pages of memory that can be weakly ordered or must be strongly ordered and enforces the appropriate memory model on those pages of memory. Such identification and memory model enforcement allows for more efficient execution of memory instructions in a hierarchical memory design in cases where memory instructions can be executed out of order. From the page table, the memory manager constructs TLB entries that associate page frame numbers of memory operands with page-granular client usage data and a memory order tag. The memory order tag identifies the memory model that is currently being enforced for the associated page of memory. The memory manager updates the memory order tag of the TLB entry in accordance with changes in the client usage information. In the preferred embodiment, the TLB structure is a global TLB shared by all processors. In alternative embodiments, the TLB structure may comprise either multiple distributed TLBs with shared knowledge, each assigned to a different processor, or a combination of multiple local TLBs, each assigned to a different processor, that exchange information with a global TLB, which in turn provides data to the memory manager to access the hierarchical memory system.
摘要:
A store queue for use in a data processor (10) with a memory storage system has a first-in-first-out ("FIFO") queue (48) and control circuitry (52). The control circuitry maintains three pointers which index the entries in the FIFO queue: a dispatch pointer (D), a completion pointer (C), and an oldest miss pointer (OM). The control circuitry stores each stole instruction in the entry designated by the dispatch pointer and then increments the dispatch pointer. The control circuitry increments the completion pointer when the data processor indicates that the previously designated store instruction is the oldest instruction in the data processor: when the instruction is "completed." The control circuitry increments the oldest miss pointer after it presents the previously designated store instruction to the memory system.