摘要:
Depending on a processor or instruction mode, a data cache block store (dcbst) or equivalent instruction is treated differently. A coherency maintenance mode for the instruction, in which the instruction is utilized to maintain coherency between bifurcated data and instruction caches, may be entered by setting bits in a processor register or by setting hint bits within the instruction. In the coherency maintenance mode, the instruction both pushes modified data to system memory and invalidates the cache entry in instruction caches. Subsequent instruction cache block invalidate (icbi) or equivalent instructions targeting the same cache location are no-oped when issued by a processor following a data cache block store or equivalent instruction executed in coherency maintenance mode. Execution of the data cache clock store instruction in coherency maintenance mode results in a novel system bus operation being initiated on the system bus. The bus operation directs other devices having bifurcated data and instruction caches to clean the specified cache entry in their data cache to at least the point of instruction/data cache coherency and invalidate the specified cache entry in their instruction cache. When repeatedly employed in sequence to write one or more pages of data to system memory, the mechanism for maintaining coherency saves processor cycles and reduces both address and data bus traffic.
摘要:
A method of managing and speculatively issuing architectural operations in a computer system. A first architectural operation is snooped and translated into a plurality of granular architectural operations to effect a large-scale architectural operation. The first architectural operation can be a first cache instruction directed to a memory block, and a plurality of cache instructions are issued which are directed to memory blocks contained in a page associated with the memory block. The granular architectural operations are transmitted to a processor bus of the computer system. A processor bus history table may be used to store a record of the large-scale architectural operation. The history table then can filter out any later architectural operation that is subsumed by the large-scale architectural operation. The history table monitors the processor bus to ensure that the large-scale architectural operations recorded in the table are still valid.
摘要:
When a device snooping the system bus of a multiprocessor system detects an operation requesting data which is resident within a local memory in a coherency state requiring the data to be sourced from the device, the device attempts a intervention. If the intervention is impeded by a second device asserting a retry, the device sets a flag to provide historical information regarding the failed intervention. On a subsequent snoop hit to the same cache location, if the device again asserts an intervention and the snooped operation is again retried, the device undertakes an action to alter the coherency state of the requested cache item towards an ultimate coherency state expected to be the result of the original operation requesting the cache item. In the case where the requested cache item includes modified data resident in the device's local memory, the action may include a push operation writing the requested cache item to system memory. This operation may be snooped by other devices from the system bus to update their local memories. In the case where the requested cache item includes data in a coherency state other than the modified state, the action may include simply altering the coherency state to a shared or invalid state.
摘要:
Cache and architectural specific functions within a cache controller are layered and provided with generic interfaces, isolating the complexities of each and allowing the overall functionality to be further divided into distinct, largely autonomous functional units. Each functional unit handles a certain type of operation and may be easily replicated or removed from the design to provide a number of cache designs with varied price and performance.
摘要:
A method of synchronizing an initiating processing unit in a multi-processor computer system with other processing units in the system, by assigning a unique tag for each processing unit, and issuing synchronization messages which include the unique tag of an initiating processing unit. The processing units each have a snoop queue for receiving snoop operations and corresponding tags associated with instructions issued by an initiating processing unit, and the processors examine their respective snoop queues to determine whether any snoop operation in those queues has a tag which is the unique tag of the initiating processing unit. A retry message is sent to the initiating processing unit from any of the other processing units which determine that a snoop operation in a snoop queue has a tag which is the unique tag of the initiating processing unit. In response to the retry message, the initiating processing unit re-issues the synchronization message, and the other processors re-examine their respective snoop queues, in response to the re-issuing of the synchronization message, to determine whether any snoop operation in those queues still has a tag which is the unique tag of the initiating processing unit.
摘要:
A method of accessing values stored in a cache used by a processor of a computer system, whereby two read operations may occur simultaneously is disclosed. Memory blocks from a memory device are loaded into respective cache lines of the cache, and address tags associated with the memory blocks are written into two redundant cache directories of the cache. Thereafter, a first memory block can be read from the cache using the first cache directory, while a second memory block is simultaneously read from the cache using the second cache directory. The cache can have a single cache entry array, or two (redundant) cache entry arrays connected respectively to the two cache directories. If an error occurs when examining a particular address tag in one cache directory, then a redundant address tag can be substituted for the particular address tag by examining a corresponding line of the other cache directory.
摘要:
Cache and architectural specific functions within a cache controller are layered to permit complex operations to be split into equivalent simple operations. Architectural variants of basic operations may thus be devolved into distinct cache and architectural operations and handled separately. The logic supporting the complex operations may thus be simplified and run faster.
摘要:
A method of bypassing defects in a cache used by a processor of a computer system. A repair mask has an array of bit fields corresponding to cache lines in the cache, and when a particular cache line in the cache is identified as being defective, a corresponding bit field in the repair mask array is set to indicate that the particular cache line is defective, and further access to the defective cache line is prevented, based on the corresponding bit field in the repair mask array. The repair mask can be used to prevent the defective cache line from ever resulting in a cache hit, and to prevent the defective cache line from ever being chosen as a victim for cache replacement. Using a set associative cache, the defective cache line is thereby effectively removed from its respective congruence class. This approach allows the cache to use all non-defective cache lines without any cache lines being reserved for redundancy.
摘要:
A method of accessing a cache used by a processor of a computer system, to eliminate arbitration logic which would otherwise be required to handle operations from multiple snooping devices. A plurality of cache directories are provided in the cache, respectively connected directly to a plurality of snooping devices using a plurality of interconnects. An operation from a given snooping device is then handled by using a respective cache directory to issue a response to a respective interconnect. For example, a first cache directory may be connected to a first interconnect on a processor side of the cache, and a second cache directory may be connected to a second interconnect on a system bus side of the cache. This construction allows handling of operations from multiple snooping devices without having to use critical path arbitration logic. Furthermore, this construction allows for improved cache access due to the physical placement of the multiple cache directories.
摘要:
A method of improving memory latency associated with a read-type operation in a multiprocessor computer system is disclosed. After a value (data or instruction) is loaded from system memory into at least two caches, the caches are marked as containing shared, unmodified copies of the value and, when a requesting processing unit issues a message indicating that it desires to read the value, a given one of the caches transmits a response indicating that the given cache can source the value. The response is transmitted in response to the cache snooping the message from an interconnect which is connected to the requesting processing unit. The response is detected by system logic and forwarded from the system logic to the requesting processing unit. The cache then sources the value to an interconnect which is connected to the requesting processing unit. The system memory detects the message and would normally source the value, but the response informs the memory device that the value is to be sourced by the cache instead. Since the cache latency can be much less than the memory latency, the read performance can be substantially improved with this new protocol.