摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor cluster network provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR within the cluster network and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs via a private protocol or dedicated wireless network, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor cluster network provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR within the cluster network and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs via a private protocol or dedicated wireless network, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained within a multiprocessor cluster system provides enhanced processor communication. The PCR stores information that is useful in pipelined or parallel multi-processing. Each processor cluster has exclusive rights to store to a sector within the PCR and has continuous access to read its contents. Each processor cluster updates its exclusive sector within the PCR, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A multiprocessor data processing system includes first and second processors coupled to an interconnect and to a global promotion facility containing a plurality of promotion bit fields. The first processor executes a single acquisition instruction to concurrently acquire a plurality of promotion bit fields exclusive of at least the second processor. In response to execution of the acquisition instruction, the first processor receives an indication of success or failure of the acquisition instruction, wherein the indication indicates success of the acquisition instruction if all of the plurality of promotion bit fields were concurrently acquired by the first processor and indicates failure of the acquisition instruction if fewer than all of the plurality of promotion bit fields were acquired by the first processor.
摘要:
A data processing system includes a global promotion facility and a plurality of processors coupled by an interconnect. In response to execution of an acquisition instruction by a first processor among the plurality of processors, the first processor transmits an address-only operation on the interconnect to acquire a promotion bit field within the global promotion facility exclusive of at least a second processor among the plurality of processors. In response to receipt of a combined response for the address-only operation representing a collective response of others of the plurality of processors to the address-only operation, the first processor determines whether or not acquisition of the promotion bit field was successful by reference to the combined response.
摘要:
A multiprocessor data processing system comprising, in addition to a first and second processor having an respective first and second cache and a main cache directory affiliated with the first processor's cache, a secondary cache directory of the first cache, which contains a subset of cache line addresses from the main cache directory corresponding to cache lines that are in a first or second coherency state, where the second coherency state indicates to the first processor that requests issued from the first processor for a cache line whose address is within the secondary directory should utilize super-coherent data currently available in the first cache and should not be issued on the system interconnect. Additionally, the cache controller logic includes a clear on barrier flag (COBF) associated with the secondary directory, which is set whenever an operation of the first processor is issued to said system interconnect. If a barrier instruction is received by the first processor while the COBF is set, the contents of the secondary directory are immediately flushed and the cache lines are tagged with an invalid state.
摘要:
A method for improving performance of a multiprocessor data processing system comprising snooping a request for data held within a shared cache line on a system bus of the data processing system whose cache contains an updated copy of the shared cache line, and responsive to a snoop of the request by the second processor, issuing a first response on the system bus indicating to the requesting processor that the requesting processor may utilize data currently stored within the shared cache line of a cache of the requesting processor. When the request is snooped by the second processor and the second processor decides to release a lock on the cache line to the requesting processor, the second processor issues a second response on the system bus indicating that the first processor should utilize new/coherent data and then the second processor releases the lock to the first processor.
摘要:
Disclosed herein are a data processing system and method of operating a data processing system that arbitrate between conflicting requests to modify data cached in a shared state and that protect ownership of the cache line granted during such arbitration until modification of the data is complete. The data processing system includes a plurality of agents coupled to an interconnect that supports pipelined transactions. While data associated with a target address are cached at a first agent among the plurality of agents in a shared state, the first agent issues a transaction on the interconnect. In response to snooping the transaction, a second agent provides a snoop response indicating that the second agent has a pending conflicting request and a coherency decision point provides a snoop response granting the first agent ownership of the data. In response to the snoop responses, the first agent is provided with a combined response representing a collective response to the transaction of all of the agents that grants the first agent ownership of the data. In response to the combined response, the first agent is permitted to modify the data.
摘要:
A method of operation within a processor that permits load instructions following barrier instructions in an instruction sequence to be issued speculatively. The barrier instruction is executed and while the barrier operation is pending, a load request associated with the load instruction is speculatively issued. A speculation flag is set to indicate the load instruction was speculatively issued. The flag is reset when an acknowledgment of the barrier operation is received. Data that is returned before the acknowledgment is received is temporarily held, and the data is forwarded to the register and/or execution unit of the processor only after the acknowledgment is received. If a snoop invalidate is detected for the speculatively issued load request before the barrier operation completes, the data is discarded and the load request is re-issued.