摘要:
A processor communication register (PCR) contained within a multiprocessor cluster system provides enhanced processor communication. The PCR stores information that is useful in pipelined or parallel multi-processing. Each processor cluster has exclusive rights to store to a sector within the PCR and has continuous access to read its contents. Each processor cluster updates its exclusive sector within the PCR, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor system provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs, instantly allowing all of the other processors to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the multiprocessor system by providing processor communications to be immediately transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A data processing system includes a global promotion facility containing a plurality of promotion bit fields, an interconnect, and a plurality of processing units coupled to the global promotion facility and to the interconnect. A first processing unit includes an instruction sequencing unit, an execution unit that executes an acquisition instruction to acquire a particular promotion bit field within the global promotion facility, and a promotion awareness facility. In response to the first processing unit snooping a request by a second processing unit for the particular promotion bit field, the first processing unit records an association between the second processing unit and the particular promotion bit field in the global promotion facility. After the request and release of the particular promotion bit field by the first processing unit, the first processing unit checks the promotion awareness facility for an association for the particular promotion bit and responsive to the checking, pushes the particular promotion bit field to the second processing unit utilizing an unsolicited operation on the interconnect such that no additional request by the second processing unit is required.
摘要:
Disclosed is a method of processing instructions in a data processing system. An instruction sequence that includes a memory access instruction is received at a processor in program order. In response to receipt of the memory access instruction a memory access request and a barrier operation are created. The barrier operation is placed on an interconnect after the memory access request is issued to a memory system. After the barrier operation has completed, the memory access request is completed in program order. When the memory access request is a load request, the load request is speculatively issued if a barrier operation is pending. Data returned by the speculatively issued load request is only returned to a register or execution unit of the processor when an acknowledgment is received for the barrier operation.
摘要:
A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z1 read on a system bus and one of two responses are received. If the response to the Z1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.
摘要:
Disclosed is a multiprocessor data processing system that executes loads transactions out of order with respect to a barrier operation. The data processing system includes a memory and a plurality of processors coupled to an interconnect. At least one of the processors includes an instruction sequencing unit for fetching an instruction sequence in program order for execution. The instruction sequence includes a first and a second load instruction and a barrier instruction, which is between the first and second load instructions in the instruction sequence. Also included in the processor is a load/store unit (LSU), which has a load request queue (LRQ) that temporarily buffers load requests associated with the first and second load instructions. The LRQ is coupled to a load request arbitration unit, which selects an order of issuing the load requests from the LRQ. Then a controller issues a load request associated with the second load instruction to memory before completion of a barrier operation associated with the barrier instruction. Alternatively, load requests are issued out-of-order with respect to the program order before or after the barrier instruction. The load request arbitration unit selects the request associated with the second load instruction before a request associated with the first load instruction, and the controller issues the request associated with the second load instruction before the request associated with the first load instruction and before issuing the barrier operation.
摘要:
An apparatus and method for monitoring an internal communication path, i.e. an internal bus, of an integrated circuit is described. The internal bus operates at a particular frequency, fb. An image of the internal bus is produced, operating at a lower frequency of operations, fo, which is more amenable to monitoring by test equipment. Signals are received from and driven to the bus using driver/receiver circuitry. The signals may be input-only, output-only, or bi-directional signals. The signals to be monitored are tapped in the driver/receiver circuitry. Depending on the placement of the signal taps in the driver/receiver logic, the signals may be “out-of-phase” with respect to one another. A buffer/align unit processes the signals in order to produce a time delayed version of the signals. The buffer/aliqn unit is used to bring each of the monitored signals back in phase relative to one another. Encoding circuitry encodes the time delayed version of the bus in a manner that produces an image of the bus at the lower frequency of operations, fo. The encoding circuitry considers the values of the monitored signals over an encoding window, and produces an encoded value for each signal at the lower frequency of operations, fo.
摘要:
A register associated with the architected logic queue of a memory-coherent device within a multiprocessor system contains a flag set whenever an architected operation—one which might affect the storage hierarchy as perceived by other devices within the system—is posted in the snoop queue of a remote snooping device. The flag remains set and is reset only when a synchronization instruction (such as the “sync” instruction supported by the PowerPC™ family of devices) is received from a local processor. The state of the flag thus provides historical information regarding architected operations which may be pending in other devices within the system after being snooped from the system bus. This historical information is utilized to determine whether a synchronization operation should be presented on the system bus, allowing unnecessary synchronization operations to be filtered and additional system bus cycles made available for other purposes. When a local processor issues a synchronization instruction to the device managing the architected logic queue, the instruction is generally accepted when the architected logic queue is empty. Otherwise the architected operation is retried back to the local processor until the architected logic queue becomes empty. If the flag is set when the synchronization instruction is accepted from the local processor, it is presented on the system bus. If the flag is not set when the synchronization instruction is received from the local processor, the synchronization operation is unnecessary and is not presented on the system bus.
摘要:
An apparatus and method for monitoring a PowerPC 60x bus within an integrated circuit is described. The 60x bus operates at a particular frequency, f.sub.b. An image of the 60x bus is produced, operating at a lower frequency of operations, f.sub.o, which is more amenable to monitoring by test equipment. Signals are received from and driven to the bus using driver/receiver circuitry. The signals may be input-only, output-only, or bi-directional signals. The signals to be monitored are tapped in the driver/receiver circuitry. Masking circuitry within the driver/receiver circuitry masks bi-directional signals, such as ARTRY.sub.-- and SHD.sub.--, during the pre-charge cycles, when these bi-directional signals are in an unpredictable state. Depending on the placement of the signal taps in the driver/receiver logic, the signals may be "out-of-phase" with respect to one another. A buffer/align unit is used to bring each of the monitored signals back in phase relative to one another. Encoding circuitry encodes the time delayed version of the bus in a manner that produces an image of the bus at the lower frequency of operations, f.sub.o. The encoding circuitry considers the values of the monitored signals over an encoding window, and produces an encoded value for each signal at the lower frequency of operations, f.sub.o.
摘要:
Cache and architectural specific functions are layered within a controller, simplifying design requirements. Faster performance may be achieved and individual segments of the overall design may be individually tested and formally verified. Transition between memory consistency models is also facilitated. Different segments of the overall design may be implemented in distinct integrated circuits, allowing less expensive processes to be employed where suitable.