摘要:
A system and method for re-ordering store operations from a processor core to a store queue. When a store queue receives a new processor-issued store operation from the processor core, a store queue controller allocates a new entry in the store queue. In response to allocating the new entry in the store queue, the store queue controller determines whether or not the new entry is dependent on at least one other valid entry in the store queue. In response to determining the new entry is dependent on at least one other valid entry in the store queue, the store queue controller inhibits requesting of the new entry to the RC dispatch logic until each valid entry on which the new entry is dependent has been successfully dispatched to an RC machine by the RC dispatch logic.
摘要:
A processing unit for a multiprocessor data processing system includes a processor core including a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, a data register, and at least one instruction execution unit coupled to the instruction sequencing unit that concurrently executes multiple threads of instructions. The processor core, responsive to the at least one instruction execution unit executing a load-reserve instruction in a first thread that binds to a load target address in the store-through upper level cache during a reservation hazard window associated with a conflicting store-conditional operation of a second thread, causes a subsequent store-conditional operation of the first thread to a store target address matching the load target address to fail if the store-conditional operation of the second thread succeeds.
摘要:
A processor communication register (PCR) contained within a multiprocessor cluster system provides enhanced processor communication. The PCR stores information that is useful in pipelined or parallel multi-processing. Each processor cluster has exclusive rights to store to a sector within the PCR and has continuous access to read its contents. Each processor cluster updates its exclusive sector within the PCR, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
In response to receiving an initialization operation from an associated processor core that indicates a target memory block to be initialized, a cache memory determines a coherency state of the target memory block. In response to a determination that the target memory block has a data-invalid coherency state with respect to the cache memory, the cache memory issues on a interconnect a corresponding initialization request indicating the target memory block. In response to the initialization request, the target memory block is initialized within a memory of the data processing system to an initialization value. The target memory block may thus be initialized without the cache memory holding a valid copy of the target memory block.
摘要:
A processor communication register (PCR) contained in each processor within a multiprocessor system provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs, instantly allowing all of the other processors to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the multiprocessor system by providing processor communications to be immediately transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A method and processor system that substantially enhances the store gathering capabilities of a store queue entry to enable gathering of a maximum number of proximate-in-time store operations before the entry is selected for dispatch. A counter is provided for each entry to track a time since a last gather to the entry. When a new gather does not occur before the counter reaches a threshold saturation point, the entry is signaled ready for dispatch. By defining an optimum threshold saturation point before the counter expires, sufficient time is provided for the entry to gather a proximate-in-time store operation. The entry may be deemed eligible for selection when certain conditions occur, including the entry becoming full, issuance of a barrier operation, and saturation of the counter. The use of the counter increases the ability of a store queue entry to complete gathering of enough store operations to update an entire cache line before that entry is dispatched to an RC machine.
摘要:
A data processing system includes a plurality of requestors and a memory controller for a system memory. In response to receiving from the requestor a read-type request targeting a memory block in the system memory, the memory controller protects the memory block from modification, and in response to an indication that the memory controller is responsible for servicing the read-type request, the memory controller transmits the memory block to the requestor. Prior to receipt of the memory block by the requestor, the memory controller ends protection of the memory block from modification, and the requestor begins protection of the memory block from modification. In response to receipt of the memory block, the requestor ends its protection of the memory block from modification.
摘要:
A processing unit for a multiprocessor data processing system includes a processor core and a lower level cache including a reservation logic that records reservations of the processor core. The reservation logic passes or fails store-conditional operations received from the processor core based upon whether the processor core has reservations for target store addresses of the store-conditional operations. The processor core includes a store-through upper level cache, a reservation register, and sequencer logic that, by reference to the reservation register, fails a store-conditional operation without communication with said reservation logic.
摘要:
A method, system, and processor chip design for reducing the latency between completing a LARX operation and receiving the associated STCX operation to complete the update to the cache line. Each entry of the store queue of the issuing processor is provided an additional tracking bit (priority bit). The priority bit is set whenever a STCX operation is placed within the entry. During selection of an entry for dispatch by the arbitration logic, the arbitration logic scans the value of the priority bits of each eligible entry. An entry with the priority bit set is given priority in the selection process within architectural rules. That entry is then selected for dispatch as early as is possible within the established rules.
摘要:
A data processing system includes at least first and second coherency domains coupled by an interconnect fabric. A memory coupled to the interconnect fabric includes an address translation table having a translation table entry utilized to translate virtual memory addresses to real memory addresses. The translation table entry also includes scope information for broadcast operations targeting addresses within a memory region associated with the translation table entry. Scope prediction logic within the first coherency domain predictively selects a scope of broadcast of an operation on an interconnect fabric of the data processing system by reference to the scope information within the address translation table entry.