摘要:
An apparatus and method of transmitting data from a PCI 2.1 compliant device is provided. PCI devices (i.e., PCI bridges) designed in accordance with the 2.1 PCI specification have a load data ordering feature that prohibits load data to bypass DMA write data in the bridge. The present apparatus and method allow load data to bypass DMA write data in the PCI bridge if the bridge is in an error state.
摘要:
A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.
摘要:
In addition to an address tag, a coherency state and an LRU position, each cache directory entry includes historical processor access information for the corresponding cache line. The historical processor access information includes different subentries for each different processor which has accessed the corresponding cache line, with subentries being “pushed” along the stack when a new processor accesses the subject cache line. Each subentries contains the processor identifier for the corresponding processor which accessed the cache line, one or more opcodes identifying the operations which were performed by the processor, and timestamps associated with each opcode. This historical processor access information may then be utilized by the cache controller to influence victim selection, coherency state transitions, LRU state transitions, deallocation timing, and other cache management functions so that smaller caches are given the effectiveness of very large caches through more intelligent cache management.
摘要:
An apparatus and method of assigning communication channels for transmitting data through a host bridge are provided. In a preferred embodiment, a determination is made as to whether data is being transmitted through any one of the channels. If data is not being transmitted through one the channels, that channel is designated as the transmission channel for the present data transaction. If data is being transmitted through all of the channels, a least most recently used channel is selected as the data transmission channel. If however, more than one channel is not transmitting data, the data transmission channel assignments are made among the idle channels from a lowest channel number (e.g., channel 0) to a highest channel number (e.g., channel 7) or vice versa.
摘要:
A cache coherency protocol that includes a modified-invalid (Mi) state, which enables execution of a DMA Claim or DClaim operation to assign sole ownership of a cache line to a device that is going to overwrite the entire cache line without cache-to-cache data transfer. The protocol enables completion of speculatively-issued full cache line writes without requiring cache-to-cache transfer of data on the data bus during a preceding DMA Claim or DClaim operation. The modified-invalid (Mi) state assigns sole ownership of the cache line to an I/O device that has speculatively-issued a DMA Write or a processor that has speculatively-issued a DCBZ operation to overwrite the entire cache line, and the Mi state prevents data being sent to the cache line from another cache since the data will most probably be overwritten.
摘要:
A data processing system includes a plurality of nodes, which each contain at least one agent and each have an associated node identifier, and memory distributed among the plurality of nodes. The data processing system further includes an interconnect containing a segmented data channel, where each node contains a segment of the segmented data channel and each segment is coupled to at least one other segment by destination logic. In response to snooping a write request of a master agent on the interconnect, a target agent that will service the write request places its node identifier in a snoop response. When the master agent receives the combined response, which contains the node identifier of the target agent, the master agent issues on the segmented data channel a write data transaction specifying the node identifier of the target agent as a destination identifier. In response to receipt of the write data transaction, the destination logic transmits the write data transaction to a next segment only if the destination identifier does not match a node identifier associated with a node containing a current segment.
摘要:
A set-associative cache memory having asymmetric latency among sets is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. The cache memory further includes a means for accessing at least one of the sets faster than the remaining sets having an identical access latency.
摘要:
A multiprocessor computer system in which snoop operations of the caches are synchronized to allow the issuance of a cache operation during a cycle which is selected based on the particular manner in which the caches have been synchronized. Each cache controller is aware of when these synchronized snoop tenures occur, and can target these cycles for certain types of requests that are sensitive to snooper retries, such as kill-type operations. The synchronization may set up a priority scheme for systems with multiple interconnect buses, or may synchronize the refresh cycles of the DRAM memory of the snooper's directory. In another aspect of the invention, windows are created during which a directory will not receive write operations (i.e., the directory is reserved for only read-type operations). The invention may be implemented in a cache hierarchy which provides memory arranged in banks, the banks being similarly synchronized. The invention is not limited to any particular type of instruction, and the synchronization functionality may be hardware or software programmable.
摘要:
A set-associative cache memory having incremental access latencies among sets is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. In accordance with a preferred embodiment of the present invention, the cache memory further includes a means for accessing each of the sets with an access time dependent on a relative location of each of the sets such that access latency varies incrementally among sets.
摘要:
A set-associative cache memory having a mechanism for migrating a most recently used set is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. The cache memory further includes a migration means for directing the information from a cache “hit” to a predetermined set of the cache memory.