摘要:
A distributed system structure for a large-way, multi-bus, multiprocessor system using a bus-based cache-coherence protocol is provided. The distributed system structure contains an address switch, multiple memory subsystems, and multiple master devices, either processors, I/O agents, or coherent memory adapters, organized into a set of nodes supported by a node controller. The node controller receives commands from a master device, communicates with a master device as another master device or as a slave device, and queues commands received from a master device. The system allows for the implementation of a bus protocol that reports the state of a cache line to a master device along with the first beat of data delivery for a cacheable coherent Read. Since the achievement of coherency is distributed in time and space, the issue of data integrity is addressed through a variety of actions. In one implementation, the node controller helps to maintain cache coherency for commands by blocking a master device from receiving certain transactions so as to prevent Read-Read deadlocks. In another implementation, the master devices use a bus protocol that prevents Read-Read deadlocks in a distributed, multi-bus, multiprocessor system.
摘要:
Within a data processing system implementing L1 and L2 caches and stream filters and buffers, prefetching of cache lines is performed in a progressive manner. In one mode, data may not be prefetched. In a second mode, two cache lines are prefetched wherein one line is prefetched into the L1 cache and the next line is prefetched into a stream buffer. In a third mode, more than two cache lines are prefetched at a time. As a result, additional cache lines are progressively prefetched to a data cache as the sequentiality of the accessing of cache lines in memory is demonstrated through sequential addressing requests along a data stream. Furthermore, the stream is physically distributed. In other words, at least one line, but not all lines, of the stream are placed within the cache.
摘要:
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor.
摘要:
A data processing system and method for prefetching data in a multi-level code subsystem. The data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache, and a third level cache and a system memory. Prefetching of cache lines is concurrently performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches are performed over a private or dedicated prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction or hint may be used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine.
摘要:
An interfacing logic is implemented in one or more processors and a memory controller in a multiprocessor system. The interfacing logic enables all processors to receive snoops and snoop responses substantially at the same time by delaying data transmitted over faster busses before the data is provided to a local logic at a receiving end of the faster busses. The interfacing logic comprises two or more paths of a multiplexer component connected to a storage component. The storage components are connected to another multiplexer component for selecting one of the two or more paths. Preferably, a bus control logic in the receiving end determines how much delay is performed to compensate for delay differences between data busses.
摘要:
In a multiprocessor system using snooping protocols, system command conflicts are prevented by comparing processor commands with prior snoops within a specified time defined window. A determination is then made as to whether a command issued by a given processor is likely to cause a system conflict with another command issued within said specified time defined window. If so, the time of execution of any such snoop command determined as being likely to cause a system conflict is delayed. This approach uses address bus arbitration rules to prevent system livelocks due to both coherency and resource conflicts.
摘要:
A method and apparatus for mapping some software prefetch instructions in a microprocessor system to a modified set of hardware prefetch instructions and executing the software prefetch by invoking the corresponding modified hardware prefetch instruction. For common software prefetch access patterns, by mapping the software prefetches to hardware, improved prefetching can be achieved without the need for additional hardware.
摘要:
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction is used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine. The instruction also limits the amount of data to be prefetched.
摘要:
An apparatus for fetching data from a main memory into a primary cache memory of a processor. Instruction fetch requests are generated by the processor and assigned a priority level according to the predicted accuracy of the fetch request. The priority levels of different fetch requests are compared and the highest priority level fetch request is serviced first. An instruction cache line address N+1 is pre-fetched if there is a cache miss in the primary cache memory on address N+1.
摘要:
Within a data processing system implementing L1 and L2 caches and stream filters and buffers, prefetching of cache lines is performed in a progressive manner. In one mode, data may not be prefetched. In a second mode, two cache lines are prefetched wherein one line is prefetched into the L1 cache and the next line is prefetched into a stream buffer. In a third mode, more than two cache lines are prefetched at a time. In the third mode cache lines may be prefetched to the L1 cache and not the L2 cache, resulting in no inclusion between the L1 and L2 caches. A directory field entry provides an indication of whether or not a particular cache line in the L1 cache is also included in the L2 cache.