摘要:
Disclosed is a method of processing instructions in a data processing system. An instruction sequence that includes a memory access instruction is received at a processor in program order. In response to receipt of the memory access instruction a memory access request and a barrier operation are created. The barrier operation is placed on an interconnect after the memory access request is issued to a memory system. After the barrier operation has completed, the memory access request is completed in program order. When the memory access request is a load request, the load request is speculatively issued if a barrier operation is pending. Data returned by the speculatively issued load request is only returned to a register or execution unit of the processor when an acknowledgment is received for the barrier operation.
摘要:
A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z1 read on a system bus and one of two responses are received. If the response to the Z1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.
摘要:
Disclosed is a multiprocessor data processing system that executes loads transactions out of order with respect to a barrier operation. The data processing system includes a memory and a plurality of processors coupled to an interconnect. At least one of the processors includes an instruction sequencing unit for fetching an instruction sequence in program order for execution. The instruction sequence includes a first and a second load instruction and a barrier instruction, which is between the first and second load instructions in the instruction sequence. Also included in the processor is a load/store unit (LSU), which has a load request queue (LRQ) that temporarily buffers load requests associated with the first and second load instructions. The LRQ is coupled to a load request arbitration unit, which selects an order of issuing the load requests from the LRQ. Then a controller issues a load request associated with the second load instruction to memory before completion of a barrier operation associated with the barrier instruction. Alternatively, load requests are issued out-of-order with respect to the program order before or after the barrier instruction. The load request arbitration unit selects the request associated with the second load instruction before a request associated with the first load instruction, and the controller issues the request associated with the second load instruction before the request associated with the first load instruction and before issuing the barrier operation.
摘要:
An apparatus and method for monitoring an internal communication path, i.e. an internal bus, of an integrated circuit is described. The internal bus operates at a particular frequency, fb. An image of the internal bus is produced, operating at a lower frequency of operations, fo, which is more amenable to monitoring by test equipment. Signals are received from and driven to the bus using driver/receiver circuitry. The signals may be input-only, output-only, or bi-directional signals. The signals to be monitored are tapped in the driver/receiver circuitry. Depending on the placement of the signal taps in the driver/receiver logic, the signals may be “out-of-phase” with respect to one another. A buffer/align unit processes the signals in order to produce a time delayed version of the signals. The buffer/aliqn unit is used to bring each of the monitored signals back in phase relative to one another. Encoding circuitry encodes the time delayed version of the bus in a manner that produces an image of the bus at the lower frequency of operations, fo. The encoding circuitry considers the values of the monitored signals over an encoding window, and produces an encoded value for each signal at the lower frequency of operations, fo.
摘要:
A register associated with the architected logic queue of a memory-coherent device within a multiprocessor system contains a flag set whenever an architected operation—one which might affect the storage hierarchy as perceived by other devices within the system—is posted in the snoop queue of a remote snooping device. The flag remains set and is reset only when a synchronization instruction (such as the “sync” instruction supported by the PowerPC™ family of devices) is received from a local processor. The state of the flag thus provides historical information regarding architected operations which may be pending in other devices within the system after being snooped from the system bus. This historical information is utilized to determine whether a synchronization operation should be presented on the system bus, allowing unnecessary synchronization operations to be filtered and additional system bus cycles made available for other purposes. When a local processor issues a synchronization instruction to the device managing the architected logic queue, the instruction is generally accepted when the architected logic queue is empty. Otherwise the architected operation is retried back to the local processor until the architected logic queue becomes empty. If the flag is set when the synchronization instruction is accepted from the local processor, it is presented on the system bus. If the flag is not set when the synchronization instruction is received from the local processor, the synchronization operation is unnecessary and is not presented on the system bus.
摘要:
An apparatus and method for monitoring a PowerPC 60x bus within an integrated circuit is described. The 60x bus operates at a particular frequency, f.sub.b. An image of the 60x bus is produced, operating at a lower frequency of operations, f.sub.o, which is more amenable to monitoring by test equipment. Signals are received from and driven to the bus using driver/receiver circuitry. The signals may be input-only, output-only, or bi-directional signals. The signals to be monitored are tapped in the driver/receiver circuitry. Masking circuitry within the driver/receiver circuitry masks bi-directional signals, such as ARTRY.sub.-- and SHD.sub.--, during the pre-charge cycles, when these bi-directional signals are in an unpredictable state. Depending on the placement of the signal taps in the driver/receiver logic, the signals may be "out-of-phase" with respect to one another. A buffer/align unit is used to bring each of the monitored signals back in phase relative to one another. Encoding circuitry encodes the time delayed version of the bus in a manner that produces an image of the bus at the lower frequency of operations, f.sub.o. The encoding circuitry considers the values of the monitored signals over an encoding window, and produces an encoded value for each signal at the lower frequency of operations, f.sub.o.
摘要:
Cache and architectural specific functions are layered within a controller, simplifying design requirements. Faster performance may be achieved and individual segments of the overall design may be individually tested and formally verified. Transition between memory consistency models is also facilitated. Different segments of the overall design may be implemented in distinct integrated circuits, allowing less expensive processes to be employed where suitable.
摘要:
A method and system for controlling access to a shared resource in a data processing system are described. According to the method, a number of requests for access to the resource are generated by a number of requesters that share the resource. Each of the requesters is associated with a priority weight that indicates a probability that the associated requester will be assigned a highest current priority. Each requester is then assigned a current priority that is determined substantially randomly with respect to previous priorities of the requesters. In response to the current priorities of the requesters, a request for access to the resource is granted. In one embodiment, a requester corresponding to a granted request is signaled that its request has been granted, and a requester corresponding to a rejected request is signaled that its request was not granted.
摘要:
A multiprocessor data processing system includes a plurality of processors coupled to an interconnect and to a global promotion facility containing at least one promotion bit field. A first processor executes a high speed instruction sequence including a load-type instruction to acquire a promotion bit field within the global promotion facility exclusive of at least a second processor. The request may be made visible to all processors coupled to the interconnect. In response to execution of the load-type instruction, a register of the first processor receives a register bit field indicating whether or not the promotion bit field was acquired by execution of the load-type instruction. While the first processor holds the promotion bit field exclusive of the second processor, the second processor is permitted to initiate a request on the interconnect. Advantageously, promotion bit fields are handled separately from data, and the communication of promotion bit fields does not entail the movement of data cache lines.
摘要:
A multiprocessor data processing system includes first and second processors coupled to an interconnect and to a global promotion facility containing at least one promotion bit field. The first processor initiates execution of a branch-type instruction to request acquisition of a promotion bit field exclusive of at least the second processor. In response to the branch-type instruction, the first processor issues an access request to acquire the promotion bit field. After the accessing request, a register of the first processor receives a register bit indicating whether or not the promotion bit field was successfully acquired by the access request. As a part of executing the branch-type instruction, the first processor selects among a first execution path and a second execution path in response to the register bit.