摘要:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs a horizontal storage device at the same level as the storage device initiating the combined operation to allocate and store either the cast out or target data. A horizontal storage device having available space—i.e., an invalid or modified data element in a congruence class for the victim—stores either the target or the cast out data for subsequent access by an intervention. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
摘要:
Combined response logic for a bus receives a combined data access and cast out/deallocate operation initiating by a storage device within a specific level of a storage hierarchy, with a coherency state and LRU position of the cast out/deallocate victim appended. Snoopers on the bus drive snoop responses to the combined operation with the coherency state and/or LRU position of locally-stored cache lines corresponding to the victim appended. The combined response logic determines, from the coherency state and LRU position information appended to the combined operation and the snoop responses, whether an update of the LRU position and/or coherency state of a cache line corresponding to the victim within one of the snoopers is required. If so, the combined response logic selects a snooper storage device to have at least the LRU position of a respective cache line corresponding to the victim updated, and appends an update command identifying the selected snooper to the combined response. The snooper selected to be updated may be randomly chosen, selected based on LRU position of the cache line corresponding to the victim within respective storage, or selected based on other criteria.
摘要:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs the storage device initiating the combined operation not to allocate storage for the target of the data access. Instead, the target of the data access may be passed directly to an in-line processor core without storage, may be stored in a horizontal storage device, or may be stored in an in-line, noninclusive, lower level storage device. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
摘要:
A data processing system includes a processor, a system memory, one or more input/output channel controllers (IOCC), and a system bus connecting the processor, the memory and the IOCCs together for communicating instructions, address and data between the various elements of a system. The IOCC includes a paged cache storage having a number of lines wherein each line of the page may be, for example, 32 bytes. Each page in the cache also has several attribute bits for that page including the so called WIM and attribute bits. The W bit is for controlling write through operations; the I bit controls cache inhibit; and the M bit controls memory coherency. Since the IOCC is unaware of these page table attribute bits for the cache lines being DMAed to system memory, IOCC must maintain memory consistency and cache coherency without sacrificing performance. For DMA write data to system memory, new cache attributes called global, cachable and demand based write through are created. Individual writes within a cache line are gathered by the IOCC and only written to system memory when the I/O bus master accesses a different cache line or relinquishes the I/O bus.
摘要:
Following a cache miss by an operation, the address for the operation is transmitted on the bus coupling the cache to lower levels of the storage hierarchy. A portion of the address including the index field is transmitted during a first bus cycle, and may be employed to begin directory lookups in lower level storage devices before the address tag is received. The remainder of the address is transmitted during subsequent bus cycles, which should be in time for address tag comparisons with the congruence class elements. To allow multiple directory lookups to be occurring concurrently in a pipelined directory, a portion of multiple addresses for several data access operations, each portion including the index field for the respective address, may be transmitted during the first bus cycle or staged in consecutive bus cycles, with the remainders of each address—including the cache tags—transmitted during the subsequent bus cycles. This allows directory lookups utilizing the index fields to be processed concurrently within a lower level storage device for multiple operations, with the address tags being provided later, but still timely for tag comparisons at the end of the directory lookup. Where the lower level storage device operates at a higher frequency than the bus, overall latency is reduced and directory bandwidth is more efficiently utilized.
摘要:
In response to receiving a combined address for related data access and cast out operations, including an index identifying a congruence class containing both the target of the data access and the victim of the cast out, a single directory access is performed utilizing the index to locate the congruence class. Address tags within the congruence class are then compared to the address tag for the data access operation and the address tag for the cast out operation concurrently, generating separate hit signals as appropriate. Only a single directory access is required, however, rather than two separate directory accesses as required in the known art, taking advantage of the fact that both the data access target and the cast out victim belong to a single congruence class. Response latency is also improved, as is address bus bandwidth utilization.
摘要:
A combined address bus transaction contains the address tag for a data access operation target, the address tag for a victim to be replaced, and the address index field identifying the congruence class including both the target and the victim. Directory state information such as coherency state and/or LRU position for the cast out victim may also be appended to the index field and target and victim address tags within the bus operation. Address bus bandwidth utilization is thereby improved, eliminating duplicate transmission of the index field employed by separate data access and cast out operations in accordance with the existing practice. The victim may be prospectively selected concurrently with the determination of whether the target may be found within the storage device forming the combined address, improving overall performance for that device.
摘要:
An effectively “conditional”, cast out operation or cast out portion of a combined operation including a related data access may be cancelled by the combined response to the operation. The combined response logic receives coherency state and/or LRU position information for cache lines corresponding to the cast out victim within snoopers and vertically in-line storage. The combined response logic may also receive information regarding the presence of shared or invalid cache lines in snoopers or lower level storage within the congruence class for the victim, or information regarding the read-once nature of the data access target. Based on these responses, the combined response logic determines whether the cast out should be cancelled and, if so, selects and drives the appropriate combined response code.
摘要:
Upon snooping a combined data access and cast out/deallocate operation initiating by a horizontal storage device, snoop logic determines, from LRU position information appended to the combined response to the combined operation, whether the coherency state and/or LRU position of the victim may be upgraded within the subject storage device. If so, the coherency state or LRU position is upgraded to improve global data storage management. For instance, a cache line within a snooping storage device may be altered to assume the coherency state of the victim within the storage device initiating the combined operation to improve data storage management under a given replacement policy.
摘要:
Combined response logic for a bus receives a combined data access and cast out/deallocate operation initiating by a storage device within a specific level of a storage hierarchy with a coherency state of the cast out/deallocate victim appended. Snoopers on the bus drive snoop responses to the combined operation with the coherency state and/or LRU position of locally-stored cache lines corresponding to the victim appended. The combined response logic determines, from the coherency state information appended to the combined operation and the snoop responses, whether a coherency upgrade is possible. If so, the combined response logic selects a snooper storage device to upgrade the coherency state of a respective cache line corresponding to the victim, and appends an upgrade directive to the combined response. The snooper selected to upgrade the coherency state of a cache line corresponding the victim may be randomly chosen or, as an optimization, be chosen for having the highest LRU position for the respective cache line.