摘要:
A method for improving performance of a multiprocessor data processing system having processor groups with shared caches. When a processor within a processor group that shares a cache snoops a modification to a shared cache line in a cache of another processor that is not within the processor group, the coherency state of the shared cache line within the first cache is set to a first coherency state that indicates that the cache line has been modified by a processor not within the processor group and that the cache line has not yet been updated within the group's cache. When a request for the cache line is later issued by a processor, the request is issued to the system bus or interconnect. If a received response to the request indicates that the processor should utilize super-coherent data, the coherency state of the cache line is set to a processor-specific super coherency state. This state indicates that subsequent requests for the cache line by the first processor should be provided said super-coherent data, while a subsequent request for the cache line by a next processor in the processor group that has not yet issued a request for the cache line on the system bus, may still go to the system bus to request the cache line. The individualized, processor-specific super coherency states are individually set but are usually changed to another coherency state (e.g., Modified or Invalid) as a group.
摘要:
Disclosed is a method of operation within a processor that permits load instructions to be issued speculatively. An instruction sequence is received that includes multiple barrier instructions and a load instruction that follows the barrier instructions in the instruction sequence. In response to the multiple barrier instructions, barrier operations are issued on an interconnect coupled to the processor. Also, while the barrier operations are pending, a load request associated with the load instruction is speculatively issued. When the load request is issued, a flag is set to indicate that it was speculatively issued. The flag is reset when acknowledgments of all the barrier operations are received. Data that is returned before the acknowledgments are received is temporarily held and forwarded to the register and/or execution unit of the processor only after the acknowledgments are received. If a snoop invalidate is detected for the speculatively issued load request before completion of the barrier operations, the data is discarded and the load request is re-issued.
摘要:
The present invention is a method and apparatus for preventing the occurrence of deadlocks from the execution of singly-initiated singly-sourced variable delay system bus operations. In general, each snooper excepts a given operation at the same time according to an agreed upon condition. In other words, the snooper in a given cache can accept an operation and begin working on it even while retrying the operation. Furthermore, none of the active snoopers release an operation until all the active snoopers are done with the operation. In other words, execution of a given operation is started by the snoopers at the same time and finished by each of the snoopers at the same time. This prevents the ping-pong deadlock by keeping any one cache from finishing the operation before any of the others.
摘要:
An integrated circuit comprises a semiconductor substrate having integrated circuitry formed therein. According to the present invention, the integrated circuitry includes a plurality of subcircuits, including first and second subcircuits that concurrently operate at diverse first and second frequencies, respectively. According to one embodiment, the integrated circuit has a clock signal that alternates between an active state and an inactive state at a third frequency and is broadcast to all of the subcircuits. In this embodiment, at least one subcircuit among the plurality of subcircuits, for example, a processor, operates in response to the clock signal at the third frequency, which is higher than the first frequency. According to another embodiment, the subcircuits each communicate with at least one other subcircuit via a latch-to-latch interface.
摘要:
A method is disclosed of managing architectural operations in a computer system whose architecture includes components having varying coherency granule sizes. A queue is provided for receiving as entries a plurality of the architectural operations, the entries of the queue are compared with a new architectural operation to determine if the new architectural operation is redundant with any of the entries. If the new architectural operation is not redundant with any of the entries, it is loaded in the queue. The computer system may include a cache having a processor granularity size and a system bus granularity size which is larger than the processor granularity size, and the architectural operations are cache instructions. The comparison may be performed in an associative manner based on the varying coherency granule sizes.
摘要:
A register associated with the architected logic queue of a memory-coherent device within a multiprocessor system contains a flag set whenever an architected operation enters the initiating device's architected logic queue to be issued on the system bus. The flag remains set even after the architected logic queue is drained, and is reset only when a synchronization instruction is received from a local processor, providing historical information regarding architected operations which may be pending in other devices. This historical information is utilized to determine whether a synchronization operation should be presented on the system bus, allowing unnecessary synchronization operations to be filtered. When a local processor issues a synchronization instruction to the device managing the architected logic queue, the instruction is generally accepted when the architected logic queue is empty. Otherwise the architected operation is retried back to the local processor until the architected logic queue becomes empty. If the flag is set when the synchronization instruction is accepted from the local processor, it is presented on the system bus. If the flag is not set when the synchronization instruction is received from the local processor, the synchronization operation is unnecessary and is not presented on the system bus.
摘要:
A method and system for controlling access to a shared resource in a data processing system are described. According to the method, a number of requests for access to the resource are generated by a number of requestors that share the resource. Each of the requestors is dynamically associated with a priority weight in response to events in the data processing system. The priority weight indicates a probability that the associated requestor will be assigned a highest current priority. Each requester is then assigned a current priority that is determined substantially randomly with respect to previous priorities of the requestors. In response to the current priorities of the requestors, a request for access to the resource is granted.
摘要:
A method of reducing errors in a cache memory of a computer system (e.g., an L2 cache) by periodically issuing a series of purge commands to the L2 cache, sequentially flushing cache lines from the L2 cache to an L3 cache in response to the purge commands, and correcting errors (single-bit) in the cache lines as they are flushed to the L3 cache. Purge commands are issued only when the processor cores associated with the L2 cache have an idle cycle available in a store pipe to the cache. The flush rate of the purge commands can be programmably set, and the purge mechanism can be implemented either in software running on the computer system, or in hardware integrated with the L2 cache. In the case of the software, the purge mechanism can be incorporated into the operating system. In the case of hardware, a purge engine can be provided which advantageously utilizes the store pipe that is provided between the L1 and L2 caches. The L2 cache can be forced to victimize cache lines, by setting tag bits for the cache lines to a value that misses in the L2 cache (e.g., cache-inhibited space). With the eviction mechanism of the cache placed in a direct-mapped mode, the address misses will result in eviction of the cache lines, thereby flushing them to the L3 cache.
摘要:
A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.
摘要:
A method and data processing system for sequentially coupling successive, homogenous processor requests for a cache line in a chain before the data is received in the cache of a first processor within the chain. Chained intermediate coherency states are assigned to track the chain of processor requests and subsequent access permission provided, prior to receipt of the data at the first processor starting the chain. The chained intermediate coherency state assigned identifies the processor operation and a directional identifier identifies the processor to which the cache line is to be forwarded. When the data is received at the cache of the first processor within the chain, the first processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chain is immediately stopped when a non-homogenous operation is snooped by the last-in-chain processor.