摘要:
A method and system for managing distributed arbitration for multi-cycle data transfer requests provides improved performance in a processing system. A multi-cycle request indicator is provided to a slice arbiter and if a multi-cycle request is present, only one slice is granted its associated bus. The method further blocks any requests from other requesting slices having a lower latency than the first slice until the latency difference between the other requesting slices and the longest latency slice added to a predetermined cycle counter value has expired. The method also blocks further requests from the first slice until the predetermined cycle counter value has elapsed and blocks requests from slices having a higher latency than the first slice until the predetermined cycle counter value less the difference in latencies for the first slice and for the higher latency slice has elapsed.
摘要:
A method of storing values in a sliced cache by providing separate, but coordinated, reservation units for each cache slice. When a load-with-reserve (larx) operation is issued from the processor core as part of an atomic read-modify-write sequence, a message is broadcast to each of the cache slices to clear reservation flags in the slices; a reservation flag is also set in the target cache slice, and a memory address associated with the load-with-reserve operation is loaded into a reservation unit of the target cache slice. When a conditional store operation is issued from the core to complete the atomic read-modify-write sequence, a second message is broadcast to any non-target cache slice of the processing unit to clear reservation flags in the non-target cache slice(s). The conditional store operation passes if the reservation flag of the target cache slice is still set, and the memory address associated with the conditional store operation matches the memory address loaded in a reservation unit of the target cache slice. The broadcast messages coordinate the reservation units and facilitate the use of larger sliced caches.
摘要:
A method and apparatus for transporting store requests between functional units within a processor is disclosed. A data processing system includes a data dispatching unit, a data receiving unit, a segmented data pipeline coupled between the data dispatching unit and the data receiving unit, and a segmented feedback line coupled between the data dispatching unit and the data receiving unit. Having multiple latches interconnected between segments, the segmented data pipeline systolically transfers data from the data dispatching unit to the data receiving unit. The segmented feedback line has multiple control latches interconnected between segments. Each of the control latches sends a control signal to a respective one of the latches in the segmented instruction pipeline to forward data to a next segment within the segmented data pipeline.
摘要:
The present invention provides a system and method for efficient execution of load reserve (LARX) and store conditional (STCX) instructions in a superscalar processor. A system for efficiently providing a LARX instruction in a superscalar processor is disclosed. The system comprises a data cache (Dcache) for receiving the LARX instruction. The data cache further includes a decoder means for setting and resetting of a validation of the load reserve instruction, an internal cache for receiving address information and for providing data. The system also includes a register means for receiving the LARX instruction and a controller means for providing a physical address based upon the address information. The system provides for the validation being accomplished in one cycle for the LARX instruction when there is a hit on the internal data cache.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor cluster network provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR within the cluster network and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs via a private protocol or dedicated wireless network, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor cluster network provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR within the cluster network and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs via a private protocol or dedicated wireless network, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor cluster network provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR within the cluster network and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs via a private protocol or dedicated wireless network, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A processor communication register (PCR) contained within a multiprocessor cluster system provides enhanced processor communication. The PCR stores information that is useful in pipelined or parallel multi-processing. Each processor cluster has exclusive rights to store to a sector within the PCR and has continuous access to read its contents. Each processor cluster updates its exclusive sector within the PCR, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A method for optimally issuing instructions that are related to a first instruction in a data processing system is disclosed. The processing system includes a primary and secondary cache. The method and system comprises speculatively indicating a hit of the first instruction in a secondary cache and releasing the dependent instructions. The method and system includes determining if the first instruction is within the secondary cache. The method and system further includes providing data related to the first instruction from the secondary cache to the primary cache when the instruction is within the secondary cache. A method and system in accordance with the present invention causes instructions that create dependencies (such as a load instruction) to signal an issue queue (which is responsible for issuing instructions with resolved conflicts) in advance, that the instruction will complete in a predetermined number of cycles. In an embodiment, a core interface unit (CIU) will signal an execution unit such as the Load Store Unit (LSU) that it is assumed that the instruction will hit in the L2 cache. An issue queue uses the signal to issue dependent instructions at an optimal time. If the instruction misses in the L2 cache, the cache hierarchy causes the instructions to be abandoned and re-executed when the data is available.