摘要:
Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued prior to the return of the sync acknowledgment. Data returned is immediately forwarded to the processor's execution units. The returned data and results of subsequent operations are held temporarily in rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative. When a barrier acknowledge is received by the BOP controller, the flag(s) of the corresponding rename register(s) are reset.
摘要:
Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.
摘要:
Disclosed is a processor that reduces barrier operations during instruction processing. An instruction sequence includes a first barrier instruction and a second barrier instruction with a store instruction in between the first and second barrier instructions. A store request associated with the store instruction is issued prior to a barrier operation associated with the first barrier instruction. A determination is made of when the store request completes before the first barrier instruction has issued. In response, only a single barrier operation is issued for both the first and second barrier instructions. The single barrier operation is issued after the store request has been issued and at the time the second barrier operation is scheduled to be issued.
摘要:
A method for increasing performance optimization in a multiprocessor data processing system. A number of predetermined thresholds are provided within a system controller logic and utilized to trigger specific bandwidth utilization responses. Both an address bus and data bus bandwidth utilization are monitored. Responsive to a fall of a percentage of data bus bandwidth utilization below a first predetermined threshold value, the system controller provides a particular response to a request for a cache line at a snooping processor having the cache line, where the response indicates to a requesting processor that the cache line will be provided. Conversely, if the percentage of data bus bandwidth utilization rises above a second predetermined threshold value, the system controller provides a next response to the request that indicates to any requesting processors that the requesting processor should utilize super-coherent data which is currently within its local cache. Similar operation on the address bus permits the system controller to triggering the issuing of Z1 Read requests for modified data in a shared cache line by processors which still have super-coherent data. The method also comprises enabling a load instruction with a plurality of bits that (1) indicates whether a resulting load request may receive super-coherent data and (2) overrides a coherency state indicating utilization of super-coherent data when said plurality of bits indicates that said load request may not utilize said super-coherent data. Specialized store instructions with appended bits and related functionality are also provided.
摘要:
A method and apparatus for preventing the occurrence of deadlocks from the execution of multiply-initiated multiply-sourced variable delay system bus operations. In general, each snooper excepts a given operation at the same time according to an agreed upon condition. In other words, the snooper in a given cache can accept an operation and begin working on it even while retrying the operation. Furthermore, none of the active snoopers release an operation until all the active snoopers are done with the operation. In other words, execution of a given operation is started by the snoopers at the same time and finished by each of the snoopers at the same time. This prevents the ping-pong deadlock by keeping any one cache from finishing the operation before any of the others.
摘要:
Cache and architectural functions within a cache controller are layered and provided with generic interfaces. Layering cache and architectural operations allows the definition of generic interfaces between controller logic and bus interface units within the controller. The generic interfaces are defined by extracting the essence of supported operations into a generic protocol. The interfaces themselves may be pulsed or held interfaces, depending on the character of the operation. Because the controller logic is isolated from the specific protocols required by a processor or bus architecture, the design may be directly transferred to new controllers for different protocols or processors by modifying the bus interface units appropriately.
摘要:
Cache and architectural functions within a cache controller are layered so that architectural operations may be symmetrically treated regardless of whether initiated by a local processor or by a horizontal processor. The same cache controller logic which handles architectural operations initiated by a horizontal device also handles architectural operations initiated by a local processor. Architectural operations initiated by a local processor are passed to the system bus and self-snooped by the controller. If necessary, the architectural controller changes the operation protocol to conform to the system bus architecture.
摘要:
A method and apparatus for ordering operations and data received by a first bus having a first ordering policy according to a second ordering policy which is different from the first ordering policy, and for transferring the ordered data on a second bus having the second ordering policy. The system includes a plurality of execution units for storing operations and executing the transfer of data between the first and second buses. Each one of the execution units are assigned to a group which represent a class of operations. The apparatus further includes intra prioritizing means, for each group, for prioritizing the stored operations according to the second ordering policy exclusive of the operation stored in the other group. The system also includes inter prioritizing means for determining which one of the prioritized operations can proceed to execute according to the second ordering policy.
摘要:
A method and system for controlling access to a shared resource in a data processing system are described. According to the method, a number of requests for access to the resource are generated by a number of requesters that share the resource. Each of the requesters is assigned a current priority, at least the highest current priority being determined substantially randomly with respect to previous priorities of the requestors. In response to the current priorities of the requestors, a request for access to the resource is granted. In one embodiment, a requester corresponding to a granted request is signaled that its request has been granted, and a requester corresponding to a rejected request is signaled that its request was not granted.
摘要:
Cache and architectural functions within a cache controller are layered and provided with generic interfaces, isolating controller logic from specific architectural complexities. Controller logic may thus be readily duplicated to extend a nonshared cache controller design to a shared cache controller design, with only straightforward modifications required. Throttling of processor-initiated operations handled by the same controller logic resolves operation flow rate issues with acceptable performance trade-offs.