摘要:
A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.
摘要:
A data processing system includes an interconnect, a plurality of nodes coupled to the interconnect that each include at least one agent, response logic within each node, and a queue. In response to snooping a transaction on the interconnect, each agent outputs a snoop response. In addition, the queue, which has an associated agent, allocates an entry to service the transaction. The response logic within each node accumulates a partial combined response of its node and any preceding node until a complete combined response for all of the plurality of nodes is obtained. However, prior to the associated agent receiving the complete combined response, the queue speculatively deallocates the entry if the partial combined response indicates that an agent other than the associated agent will service the transaction.
摘要:
A multiprocessor computer system in which snoop operations of the caches are synchronized to allow the issuance of a cache operation during a cycle which is selected based on the particular manner in which the caches have been synchronized. Each cache controller is aware of when these synchronized snoop tenures occur, and can target these cycles for certain types of requests that are sensitive to snooper retries, such as kill-type operations. The synchronization may set up a priority scheme for systems with multiple interconnect buses, or may synchronize the refresh cycles of the DRAM memory of the snooper's directory. In another aspect of the invention, windows are created during which a directory will not receive write operations (i.e., the directory is reserved for only read-type operations). The invention may be implemented in a cache hierarchy which provides memory arranged in banks, the banks being similarly synchronized. The invention is not limited to any particular type of instruction, and the synchronization functionality may be hardware or software programmable.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. Further branch predictions or stream associations that were made based on an earlier speculative choice are linked by using a tag pool which assigns a bit fields in the tag pool entries to the level of speculation depth. Each entry shares in common the bit field values associated with earlier branches or stream associations. When a branch or stream predicted entry is no longer needed, that entry can be cancelled and all entries that were to be loaded dependent on that entry can likewise be cancelled by walking through all entries sharing the bit fields corresponding to the speculation depth of the cancelled entry and tagging those entries as invalid.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. After a predetermined number of cycles has elapsed, the speculative load request is cancelled if the request has not already been completed.
摘要:
A processor having a hashed and partitioned storage subsystem includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a cache subsystem including a plurality of caches that store data utilized by the execution unit. Each cache among the plurality of caches stores only data having associated addresses within a respective one of a plurality of subsets of an address space. In one preferred embodiment, the execution units of the processor include a number of load-store units (LSUs) that each process only instructions that access data having associated addresses within a respective one of the plurality of address subsets. The processor may further be incorporated within a data processing system having a number of interconnects and a number of sets of system memory hardware that each have affinity to a respective one of the plurality of address subsets.
摘要:
A method and apparatus for efficiently managing caches with non-power-of-two congruence classes allows for increasing the number of congruence classes in a cache when not enough area is available to double the cache size. One or more congruence classes within the cache have their associative sets split so that a number of congruence classes are created with reduced associativity. The management method and apparatus allow access to the congruence classes without introducing any additional cycles of delay or complex logic.
摘要:
A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.