摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
Disclosed is a method of processing instructions in a data processing system. An instruction sequence that includes a memory access instruction is received at a processor in program order. In response to receipt of the memory access instruction a memory access request and a barrier operation are created. The barrier operation is placed on an interconnect after the memory access request is issued to a memory system. After the barrier operation has completed, the memory access request is completed in program order. When the memory access request is a load request, the load request is speculatively issued if a barrier operation is pending. Data returned by the speculatively issued load request is only returned to a register or execution unit of the processor when an acknowledgment is received for the barrier operation.
摘要:
A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z1 read on a system bus and one of two responses are received. If the response to the Z1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.
摘要:
Disclosed is a multiprocessor data processing system that executes loads transactions out of order with respect to a barrier operation. The data processing system includes a memory and a plurality of processors coupled to an interconnect. At least one of the processors includes an instruction sequencing unit for fetching an instruction sequence in program order for execution. The instruction sequence includes a first and a second load instruction and a barrier instruction, which is between the first and second load instructions in the instruction sequence. Also included in the processor is a load/store unit (LSU), which has a load request queue (LRQ) that temporarily buffers load requests associated with the first and second load instructions. The LRQ is coupled to a load request arbitration unit, which selects an order of issuing the load requests from the LRQ. Then a controller issues a load request associated with the second load instruction to memory before completion of a barrier operation associated with the barrier instruction. Alternatively, load requests are issued out-of-order with respect to the program order before or after the barrier instruction. The load request arbitration unit selects the request associated with the second load instruction before a request associated with the first load instruction, and the controller issues the request associated with the second load instruction before the request associated with the first load instruction and before issuing the barrier operation.
摘要:
Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued prior to the return of the sync acknowledgment. Data returned is immediately forwarded to the processor's execution units. The returned data and results of subsequent operations are held temporarily in rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative. When a barrier acknowledge is received by the BOP controller, the flag(s) of the corresponding rename register(s) are reset.
摘要:
Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.
摘要:
Disclosed is a processor that reduces barrier operations during instruction processing. An instruction sequence includes a first barrier instruction and a second barrier instruction with a store instruction in between the first and second barrier instructions. A store request associated with the store instruction is issued prior to a barrier operation associated with the first barrier instruction. A determination is made of when the store request completes before the first barrier instruction has issued. In response, only a single barrier operation is issued for both the first and second barrier instructions. The single barrier operation is issued after the store request has been issued and at the time the second barrier operation is scheduled to be issued.
摘要:
A method for increasing performance optimization in a multiprocessor data processing system. A number of predetermined thresholds are provided within a system controller logic and utilized to trigger specific bandwidth utilization responses. Both an address bus and data bus bandwidth utilization are monitored. Responsive to a fall of a percentage of data bus bandwidth utilization below a first predetermined threshold value, the system controller provides a particular response to a request for a cache line at a snooping processor having the cache line, where the response indicates to a requesting processor that the cache line will be provided. Conversely, if the percentage of data bus bandwidth utilization rises above a second predetermined threshold value, the system controller provides a next response to the request that indicates to any requesting processors that the requesting processor should utilize super-coherent data which is currently within its local cache. Similar operation on the address bus permits the system controller to triggering the issuing of Z1 Read requests for modified data in a shared cache line by processors which still have super-coherent data. The method also comprises enabling a load instruction with a plurality of bits that (1) indicates whether a resulting load request may receive super-coherent data and (2) overrides a coherency state indicating utilization of super-coherent data when said plurality of bits indicates that said load request may not utilize said super-coherent data. Specialized store instructions with appended bits and related functionality are also provided.