摘要:
Disclosed is a method of processing instructions in a data processing system. An instruction sequence that includes a memory access instruction is received at a processor in program order. In response to receipt of the memory access instruction a memory access request and a barrier operation are created. The barrier operation is placed on an interconnect after the memory access request is issued to a memory system. After the barrier operation has completed, the memory access request is completed in program order. When the memory access request is a load request, the load request is speculatively issued if a barrier operation is pending. Data returned by the speculatively issued load request is only returned to a register or execution unit of the processor when an acknowledgment is received for the barrier operation.
摘要:
A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z1 read on a system bus and one of two responses are received. If the response to the Z1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.
摘要:
Disclosed is a multiprocessor data processing system that executes loads transactions out of order with respect to a barrier operation. The data processing system includes a memory and a plurality of processors coupled to an interconnect. At least one of the processors includes an instruction sequencing unit for fetching an instruction sequence in program order for execution. The instruction sequence includes a first and a second load instruction and a barrier instruction, which is between the first and second load instructions in the instruction sequence. Also included in the processor is a load/store unit (LSU), which has a load request queue (LRQ) that temporarily buffers load requests associated with the first and second load instructions. The LRQ is coupled to a load request arbitration unit, which selects an order of issuing the load requests from the LRQ. Then a controller issues a load request associated with the second load instruction to memory before completion of a barrier operation associated with the barrier instruction. Alternatively, load requests are issued out-of-order with respect to the program order before or after the barrier instruction. The load request arbitration unit selects the request associated with the second load instruction before a request associated with the first load instruction, and the controller issues the request associated with the second load instruction before the request associated with the first load instruction and before issuing the barrier operation.
摘要:
A multiprocessor data processing system comprising, in addition to a first and second processor having an respective first and second cache and a main cache directory affiliated with the first processor's cache, a secondary cache directory of the first cache, which contains a subset of cache line addresses from the main cache directory corresponding to cache lines that are in a first or second coherency state, where the second coherency state indicates to the first processor that requests issued from the first processor for a cache line whose address is within the secondary directory should utilize super-coherent data currently available in the first cache and should not be issued on the system interconnect. Additionally, the cache controller logic includes a clear on barrier flag (COBF) associated with the secondary directory, which is set whenever an operation of the first processor is issued to said system interconnect. If a barrier instruction is received by the first processor while the COBF is set, the contents of the secondary directory are immediately flushed and the cache lines are tagged with an invalid state.
摘要:
A method for improving performance of a multiprocessor data processing system comprising snooping a request for data held within a shared cache line on a system bus of the data processing system whose cache contains an updated copy of the shared cache line, and responsive to a snoop of the request by the second processor, issuing a first response on the system bus indicating to the requesting processor that the requesting processor may utilize data currently stored within the shared cache line of a cache of the requesting processor. When the request is snooped by the second processor and the second processor decides to release a lock on the cache line to the requesting processor, the second processor issues a second response on the system bus indicating that the first processor should utilize new/coherent data and then the second processor releases the lock to the first processor.
摘要:
Disclosed herein are a data processing system and method of operating a data processing system that arbitrate between conflicting requests to modify data cached in a shared state and that protect ownership of the cache line granted during such arbitration until modification of the data is complete. The data processing system includes a plurality of agents coupled to an interconnect that supports pipelined transactions. While data associated with a target address are cached at a first agent among the plurality of agents in a shared state, the first agent issues a transaction on the interconnect. In response to snooping the transaction, a second agent provides a snoop response indicating that the second agent has a pending conflicting request and a coherency decision point provides a snoop response granting the first agent ownership of the data. In response to the snoop responses, the first agent is provided with a combined response representing a collective response to the transaction of all of the agents that grants the first agent ownership of the data. In response to the combined response, the first agent is permitted to modify the data.
摘要:
A method of operation within a processor that permits load instructions following barrier instructions in an instruction sequence to be issued speculatively. The barrier instruction is executed and while the barrier operation is pending, a load request associated with the load instruction is speculatively issued. A speculation flag is set to indicate the load instruction was speculatively issued. The flag is reset when an acknowledgment of the barrier operation is received. Data that is returned before the acknowledgment is received is temporarily held, and the data is forwarded to the register and/or execution unit of the processor only after the acknowledgment is received. If a snoop invalidate is detected for the speculatively issued load request before the barrier operation completes, the data is discarded and the load request is re-issued.
摘要:
A method for improving performance of a multiprocessor data processing system having processor groups with shared caches. When a processor within a processor group that shares a cache snoops a modification to a shared cache line in a cache of another processor that is not within the processor group, the coherency state of the shared cache line within the first cache is set to a first coherency state that indicates that the cache line has been modified by a processor not within the processor group and that the cache line has not yet been updated within the group's cache. When a request for the cache line is later issued by a processor, the request is issued to the system bus or interconnect. If a received response to the request indicates that the processor should utilize super-coherent data, the coherency state of the cache line is set to a processor-specific super coherency state. This state indicates that subsequent requests for the cache line by the first processor should be provided said super-coherent data, while a subsequent request for the cache line by a next processor in the processor group that has not yet issued a request for the cache line on the system bus, may still go to the system bus to request the cache line. The individualized, processor-specific super coherency states are individually set but are usually changed to another coherency state (e.g., Modified or Invalid) as a group.
摘要:
Disclosed is a method of operation within a processor that permits load instructions to be issued speculatively. An instruction sequence is received that includes multiple barrier instructions and a load instruction that follows the barrier instructions in the instruction sequence. In response to the multiple barrier instructions, barrier operations are issued on an interconnect coupled to the processor. Also, while the barrier operations are pending, a load request associated with the load instruction is speculatively issued. When the load request is issued, a flag is set to indicate that it was speculatively issued. The flag is reset when acknowledgments of all the barrier operations are received. Data that is returned before the acknowledgments are received is temporarily held and forwarded to the register and/or execution unit of the processor only after the acknowledgments are received. If a snoop invalidate is detected for the speculatively issued load request before completion of the barrier operations, the data is discarded and the load request is re-issued.
摘要:
A cache memory which loads two memory values into two cache lines by receiving separate portions of a first requested memory value from a first data bus over a first time span of successive clock cycles and receiving separate portions of a second requested memory value from a second data bus over a second time span of successive clock cycles which overlaps with the first time span. In the illustrative embodiment a first input line is used for loading both a first byte array of the first cache line and a first byte array of the second cache line, a second input line is used for loading both a second byte array of the first cache line and a second byte array of the second cache line, and the transmission of the separate portions of the first and second memory values is interleaved between the first and second data busses.