摘要:
A data processing system includes a plurality of processing units, including at least a local master and a local hub, which are coupled for communication via a communication link. The local master includes a master capable of initiating an operation, a snooper capable of receiving an operation, and interconnect logic coupled to a communication link coupling the local master to the local hub. The interconnect logic includes request logic that synchronizes internal transmission of a request of the master to the snooper with transmission, via the communication link, of the request to the local hub.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. Information, such as the address, of the snoop request is stored in a queue of the stall/reorder unit. The stall/reorder unit forwards the snoop request to a selector which also receives a request from a processor. An arbitration mechanism selects either the snoop request or the request from the processor. If the snoop request is denied by the arbitration mechanism, information, e.g., address, about the snoop request may be maintained in the stall/reorder unit. The request may be later resent to the selector. This process may be repeated up to “n” clock cycles. By providing the snoop request additional opportunities (n clock cycles) to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied.
摘要:
A cache coherent data processing system includes at least first and second coherency domains. In a first cache memory within the first coherency domain of the data processing system, a coherency state field associated with a storage location and an address tag is set to a first data-invalid coherency state that indicates that the address tag is valid and that the storage location does not contain valid data. In response to snooping an exclusive access operation, the exclusive access request specifying a target address matching the address tag and indicating a relative domain location of a requestor that initiated the exclusive access operation, the first cache memory updates the coherency state field from the first data-invalid coherency state to a second data-invalid coherency state that indicates that the address tag is valid, that the storage location does not contain valid data, and whether a target memory block associated with the address tag is cached within the first coherency domain upon successful completion of the exclusive access operation based upon the relative location of the requestor.
摘要:
A processing unit for a multiprocessor data processing system includes a processor core and a cache hierarchy coupled to the processor core to provide low latency data access. The cache hierarchy includes an upper level cache coupled to the processor core and a lower level victim cache coupled to the upper level cache. In response to a prefetch request of the processor core that misses in the upper level cache, the lower level victim cache determines whether the prefetch request misses in the directory of the lower level victim cache and, if so, allocates a state machine in the lower level victim cache that services the prefetch request by issuing the prefetch request to at least one other processing unit of the multiprocessor data processing system.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. Information, such as the address, of the snoop request is stored in a queue of the stall/reorder unit. The stall/reorder unit forwards the snoop request to a selector which also receives a request from a processor. An arbitration mechanism selects either the snoop request or the request from the processor. If the snoop request is denied by the arbitration mechanism, information, e.g., address, about the snoop request may be maintained in the stall/reorder unit. The request may be later resent to the selector. This process may be repeated up to “n” clock cycles. By providing the snoop request additional opportunities (n clock cycles) to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied.
摘要:
A data processing system includes a plurality of communication links and a plurality of processing units including a local master processing unit. The local master processing unit includes interconnect logic that couples the processing unit to one or more of the plurality of communication links and an originating master coupled to the interconnect logic. The originating master originates an operation by issuing a write-type request on at least one of the one or more communication links, receives from a snooper in the data processing system a destination tag identifying a route to the snooper, and, responsive to receipt of the combined response and the destination tag, initiates a data transfer including a data payload and a data tag identifying the route provided within the destination tag.
摘要:
A data processing system includes an interconnect fabric, a system memory coupled to the interconnect fabric and including a virtual barrier synchronization region allocated to storage of virtual barrier synchronization registers (VBSRs), and a plurality of processing units coupled to the interconnect fabric and operable to access the virtual barrier synchronization region of the system memory. Each of the plurality of processing units includes a processor core and a cache memory including a cache array that caches VBSR lines from the virtual barrier synchronization region of the system memory and a cache controller. The cache controller, responsive to a store request from the processor core to update a particular VBSR line, performs a non-blocking update of the cache array in each other of the plurality of processing units contemporaneously holding a copy of the particular VBSR line by transmitting a VBSR update command on the interconnect fabric.
摘要:
A data processing system includes an interconnect fabric, a system memory coupled to the interconnect fabric and including a virtual barrier synchronization region allocated to storage of virtual barrier synchronization registers (VBSRs), and a plurality of processing units coupled to the interconnect fabric and operable to access the virtual barrier synchronization region of the system memory. Each of the plurality of processing units includes a processor core and a cache memory including a cache array that caches VBSR lines from the virtual barrier synchronization region of the system memory and a cache controller. The cache controller, responsive to a store request from the processor core to update a particular VBSR line, performs a non-blocking update of the cache array in each other of the plurality of processing units contemporaneously holding a copy of the particular VBSR line by transmitting a VBSR update command on the interconnect fabric.
摘要:
A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.
摘要:
A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.