Abstract:
A coherent data processing system includes a system fabric communicatively coupling a plurality of coherence participants and fabric control logic. The fabric control logic quantifies congestion on the system fabric based on coherence messages associated with commands issued on the system fabric. Based on the congestion on the system fabric, the fabric control logic determines a rate of request issuance applicable to a set of coherence participants among the plurality of coherence participants. The fabric control logic issues at least one rate command to set a rate of request issuance to the system fabric of the set of coherence participants.
Abstract:
In one embodiment, a set-associative cache memory has a plurality of congruence classes each including multiple entries for storing cache lines of data. The cache memory includes a bank of counters, which includes a respective one of a plurality of counters for each cache line stored in the plurality of congruence classes. The cache memory selects victim cache lines for eviction from the cache memory by reference to counter values of counters within the bank of counters. A dynamic distribution of counter values of counters within the bank of counters is determined. In response, an amount counter values of counters within the bank of counters are adjusted on a cache miss is adjusted based on the dynamic distribution of the counter values.
Abstract:
A processor core of a data processing system receives a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread. In response to receiving the push instruction, the processor core executes the push instruction of the sending thread. In response to executing the push instruction, the processor core initiates transmission of the message payload to the mailbox of the receiving thread. In one embodiment, the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch of the data processing system via an interconnect fabric.
Abstract:
In at least some embodiments, a processor core executes a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order. Each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread. In response to executing the first and second push instructions, the processor core transmits respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system. The processor core transmits the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch.
Abstract:
A data processing system includes system memory and a plurality of processor cores each supported by a respective one of a plurality of vertical cache hierarchies. A first vertical cache hierarchy records information indicating communication of cache lines between the first vertical cache hierarchy and others of the plurality of vertical cache hierarchies. Based on selection of a victim cache line for eviction, the first vertical cache hierarchy determines, based on the recorded information, whether to perform a lateral castout of the victim cache line to another of the plurality of vertical cache hierarchies rather than to system memory and selects, based on the recorded information, a second vertical cache hierarchy among the plurality of vertical cache hierarchies as a recipient of the victim cache line via a lateral castout. Based on the determination, the first vertical cache hierarchy performs a castout of the victim cache line.
Abstract:
A data processing system includes a plurality of snoopers, a processing unit including master, and a system fabric communicatively coupling the master and the plurality of snoopers. The master sets a retry operating mode for an interconnect operation in one of alternative first and second operating modes. The first operating mode is associated with a first type of snooper, and the second operating mode is associated with a different second type of snooper. The master issues a memory access request of the interconnect operation on the system fabric of the data processing system. Based on receipt of a combined response representing a systemwide coherence response to the request, the master delays an interval having a duration dependent on the retry operating mode and thereafter reissues the memory access request on the system fabric.
Abstract:
A technique for operating a data processing system that implements a split transaction coherency protocol that has an address tenure and a data tenure includes receiving, at a data source, a command (that includes an address tenure for requested data) that is issued from a data sink. The data source issues a response that indicates data associated with the address tenure is available to be transferred to the data sink during a data tenure. In response to determining that the data is available subsequent to issuing the response, the data source issues a first data packet to the data sink that includes the data during the data tenure. In response to determining that the data is not available subsequent to issuing the response, the data source issues a second data packet to the data sink that includes a data header that indicates the data is unavailable.
Abstract:
A multiprocessor data processing system includes multiple vertical cache hierarchies supporting a plurality of processor cores, a system memory, and an interconnect fabric coupled to the system memory and the multiple vertical cache hierarchies. Based on a request of a requesting processor core among the plurality of processor cores, a master in the multiprocessor data processing system issues, via the interconnect fabric, a read-type memory access request. The master receives via the interconnect fabric at least one beat of conditional data issued speculatively on the interconnect fabric by a controller of the system memory prior to receipt by the controller of a systemwide coherence response for the read-type memory access request. The master forwards the at least one beat of conditional data to the requesting processor core.
Abstract:
A set-associative cache memory includes a bank of counters including a respective one of a plurality of counters for each cache line stored in a plurality of congruence classes of the cache memory. Prior to receiving a memory access request that maps to a particular congruence class of the cache memory, the cache memory pre-selects a first victim cache line stored in a particular entry of a particular congruence class for eviction based on at least a counter value of the victim cache line. In response to receiving a memory access request that maps to the particular congruence class and that misses, the cache memory evicts the pre-selected first victim cache line from the particular entry, installs a new cache line in the particular entry, and pre-selects a second victim cache line from the particular congruence class based on at least a counter value of the second victim cache line.
Abstract:
A processor core of a data processing system receives a push instruction of a sending thread that requests that a message payload identified by at least one operand of the push instruction be pushed to a mailbox of a receiving thread. In response to receiving the push instruction, the processor core executes the push instruction of the sending thread. In response to executing the push instruction, the processor core initiates transmission of the message payload to the mailbox of the receiving thread. In one embodiment, the processor core initiates transmission of the message payload by transmitting a co-processor request to a switch of the data processing system via an interconnect fabric.