Abstract:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.
Abstract:
A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. The additional cache states allow cache state transition sequences to be optimized. In particular, the claimed system and method provides that a given processor, after acquiring a lock or reservation to a given cache line, will keep the lock, to make successive modifications to the cache line, instead of releasing it to other processors after making only one modification. By doing so, the overhead typically required to acquire a lock before making any cache line modification is eliminated for successive modifications.
Abstract:
Disclosed is a method and memory subsystem that allows for speculative issuance of reads to a DRAM array to provide efficient utilization of the data out bus and faster read response for accesses to a single DRAM array. Two read requests are issued simultaneously to a first and second DRAM in the memory subsystem, respectively. Data issued from the first DRAM is immediately placed on the data out bus, while data issued from the second DRAM is held in an associated buffer. The processor or memory controller then generates a release signal if the second read is not speculative or is correctly speculated. The release signal is sent to the second DRAM after the first issued data is placed on the bus. The release signal releases the data held in the buffer associated with the second DRAM from the buffer to the data out bus. Because the data has already been issued when the release signal is received, no loss of time is incurred in issuing the data from the DRAM and only a small clock cycle delay occurs between the first issued data and the second issued data on the data out bus.
Abstract:
A multiprocessor system bus protocol system and method for processing and handling a processor request within a multiprocessor system having a number of bus accessible memory devices that are snooping on. at least one bus line. Snoop response groups which are groups of different types of snoop responses from the bus accessible memory devices are provided. Different transfer types are provided within each of the snoop response groups. A bus master device that provides a bus master signal is designated. The bus master device receives the processor request. One of the snoop response groups and one of the transfer types are appropriately designated based on the processor request. The bus master signal is formulated from a snoop response group, a transfer type, a valid request signal, and a cache line address. The bus master signal is sent to all of the bus accessible memory devices on the cache bus line and to a combined response logic system. All of the bus accessible memory devices on the cache bus line send snoop responses in response to the bus master signal based on the designated snoop response group. The snoop responses are sent to the combined response logic system. A combined response by the combined response logic system is determined based on the appropriate combined response encoding logic determined by the designated and latched snoop response group. The combined response is sent to all of the bus accessible memory devices on the cache bus line.
Abstract:
A method for transmitting ordered packets on a bus within a data processing system is disclosed. A data processing system includes a bus connected between a bus master and a bus slave. The bus master consecutively issues multiple packets, such as command packets, to the bus slave on the bus. The packets include order sensitive packets and non-order sensitive packets. In response to a temporary inability of the bus slave to process a particular one of the order sensitive packets due to a lack of resources, the bus slave keeps retrying the particular order sensitive packet. When resources become available, the bus slave processes the retried order sensitive packets in order while allowing the retried non-order sensitive packets to be processed in any order.
Abstract:
A method of maintaining coherency in a multiprocessor computer system wherein each processing unit's cache has sectored cache lines. A first cache coherency state is assigned to one of the sectors of a particular cache line, and a second cache coherency state, different from the first cache coherency state, is assigned to the overall cache line while maintaining the first cache coherency state for the first sector. The first cache coherency state may provide an indication that the first sector contains a valid value which is not shared with any other cache (i.e., an exclusive or modified state), and the second cache coherency state may provide an indication that at least one of the sectors in the cache line contains a valid value which is shared with at least one other cache (a shared, recently-read, or tagged state). Other coherency states may be applied to other sectors in the same cache line. Partial intervention may be achieved by issuing a request to retrieve an entire cache line, and sourcing only a first sector of the cache line in response to the request. A second sector of the same cache line may be sourced from a third cache. Other sectors may also be sourced from a system memory device of the computer system as well. Appropriate system bus codes are utilized to transmit cache operations to the system bus and indicate which sectors of the cache line are targets of the cache operation.
Abstract:
A data processing system includes at least first and second nodes and a segmented interconnect having coupled first and second segments. The first node includes the first segment and first and second agents coupled to the first segment, and the second node includes the second segment and a third agent coupled to the second segment. The first node further includes cancellation logic that, in response to the first agent issuing a request on the segmented interconnect that propagates from the first segment to the second segment and the second agent indicating ability to service the request, sends a cancellation message to the third agent instructing the third agent to ignore the request.
Abstract:
In response to a need to initiate a global operation, a bus master within a multiprocessor system issues a combined token and operation request on a bus coupled to the bus master. The combined token and operation request solicits one of a plurality of tokens required to complete the global operation and identifies the global operation to be processed with the token, if granted. Bus snoopers contain a number of snooper queues for global operations equal to the number of global operation tokens employed within the multiprocessor system. A bus snooper, upon detecting a combined token and operation request, begins speculatively processing the operation if the snooper is not already busy. Before completing the operation, the snooper watches for a combined response with a token number acknowledging either the combined request or a subsequent token request from the same processor, which indicates that the originating bus master has been granted a token for completing a global operation. Otherwise, a combined response acknowledging an operation request containing the token number implies release of the granted token.
Abstract:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs a horizontal storage device at the same level as the storage device initiating the combined operation to allocate and store either the cast out or target data. A horizontal storage device having available space—i.e., an invalid or modified data element in a congruence class for the victim—stores either the target or the cast out data for subsequent access by an intervention. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
Abstract:
According to the present invention, a data processing system includes a cache having a cache directory. A status indication indicative of the status of at least one of a plurality of data entries in the cache is stored in the cache directory. In response to receipt of a cache operation request, a determination is made whether to update the status indication. In response to the determination that the status indication is to be updated, the status indication is copied into a shadow register and updated. The status indication is then written back into the cache directory at a later time. The shadow register thus serves as a virtual cache controller queue that dynamically mimics a cache directory entry without functional latency.