摘要:
A set-associative cache memory having incremental access latencies among sets is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. In accordance with a preferred embodiment of the present invention, the cache memory further includes a means for accessing each of the sets with an access time dependent on a relative location of each of the sets such that access latency varies incrementally among sets.
摘要:
A set-associative cache memory having a mechanism for migrating a most recently used set is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. The cache memory further includes a migration means for directing the information from a cache “hit” to a predetermined set of the cache memory.
摘要:
A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that the prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.
摘要:
A host bridge having a dataflow controller is provided. In a preferred embodiment, the host bridge contains a read command path which has a mechanism for requesting and receiving data from an upstream device. The host bridge also contains a write command path that has means for receiving data from a downstream device and for transmitting the received data to an upstream device. A target controller is used to receive the read and write commands from the downstream device and to steer the read command toward the read command path and the write command toward the write command path. A bus controller is also used to request control of an upstream bus before transmitting the request for data of the read command and transmitting the data of the write command.
摘要:
Upon snooping an operation in which an intervention is permitted or required, an intervening cache may elect to source only that portion of a requested cache line which is actually required, rather than the entire cache line. For example, if the intervening cache determines that the requesting cache would likely be required to invalidate the cache line soon after receipt, less than the full cache line may be sourced to the requesting cache. The requesting cache will not cache less than a full cache line, but may forward the received data to the processor supported by the requesting cache. Data bus bandwidth utilization may therefore be reduced. Additionally, the need to subsequently invalidate the cache line within the requesting cache is avoided, together with the possibility that the requesting cache will retry an operation requiring invalidation of the cache line.
摘要:
In addition to an address tag, a coherency state and an LRU position, each cache directory entry includes historical processor access, snoop operation, and system controller hint information for the corresponding cache line. Each entry includes different subentries for different processors which have accessed the corresponding cache line, with subentries containing a processor access sequence segment, a snoop operation sequence segment, and a system controller hint history segment. In addition to an address tag, within each system controller bus transaction sequence log directory entry is contained one or more opcodes identifying bus operations addressing the corresponding cache line, a processor identifier associated with each opcode, and a timestamp associated with each opcode. Also, along with each system bus transaction's opcode, the individual snoop responses that were received from one or more snoopers and the hint information that was provided to the requester and the snoopers may also be included. This information may then be utilized by the system controller to append hints to the combined snoop responses in order to influence cache controllers (the requestor(s), snoopers, or both) handling of victim selection, coherency state transitions, LRU state transitions, deallocation timing, and other cache management functions.
摘要:
In addition to an address tag, a coherency state and an LRU position, each cache directory entry includes historical processor access and snoop operation information for the corresponding cache line. The historical processor access and snoop operation information includes different subentries for each different processor which has accessed the corresponding cache line, with subentries being “pushed” along the stack when a new processor accesses the subject cache line. Each subentries contains the processor identifier for the corresponding-processor which accessed the cache line, a processor access history segment, and a snoop operation history segment. The processor access history segment contains one or more opcodes identifying the operations which were performed by the processor, and timestamps associated with each opcode. The snoop operation history segment contains, for each operation snooped by the respective processor, a processor identifier for the processor originating the snooped operation, an opcode identifying the snooped operation, and a timestamp identifying when the operation was snooped. This historical processor access and snoop operation information may then be utilized by the cache controller to influence victim selection, coherency state transitions, LRU state transitions, deallocation timing, and other cache management functions so that smaller caches are given the effectiveness of very large caches through more intelligent cache management.
摘要:
A data processing system includes an interconnect, a plurality of nodes coupled to the interconnect that each include at least one agent, response logic within each node, and a queue. In response to snooping a transaction on the interconnect, each agent outputs a snoop response. In addition, the queue, which has an associated agent, allocates an entry to service the transaction. The response logic within each node accumulates a partial combined response of its node and any preceding node until a complete combined response for all of the plurality of nodes is obtained. However, prior to the associated agent receiving the complete combined response, the queue speculatively deallocates the entry if the partial combined response indicates that an agent other than the associated agent will service the transaction.
摘要:
A multiprocessor computer system in which snoop operations of the caches are synchronized to allow the issuance of a cache operation during a cycle which is selected based on the particular manner in which the caches have been synchronized. Each cache controller is aware of when these synchronized snoop tenures occur, and can target these cycles for certain types of requests that are sensitive to snooper retries, such as kill-type operations. The synchronization may set up a priority scheme for systems with multiple interconnect buses, or may synchronize the refresh cycles of the DRAM memory of the snooper's directory. In another aspect of the invention, windows are created during which a directory will not receive write operations (i.e., the directory is reserved for only read-type operations). The invention may be implemented in a cache hierarchy which provides memory arranged in banks, the banks being similarly synchronized. The invention is not limited to any particular type of instruction, and the synchronization functionality may be hardware or software programmable.
摘要:
A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster).