摘要:
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction is used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine. The instruction also limits the amount of data to be prefetched.
摘要:
A data processing system and method for prefetching data in a multi-level code subsystem. The data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache, and a third level cache and a system memory. Prefetching of cache lines is concurrently performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches are performed over a private or dedicated prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. A software instruction or hint may be used to accelerate the prefetch process by overriding the normal functionality of the hardware prefetch engine.
摘要:
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor. The prefetch request may include a signal notifying the third level cache to also prefetch.
摘要:
A data processing system includes a processor having a first level cache and a prefetch engine. Coupled to the processor are a second level cache and a third level cache and a system memory. Prefetching of cache lines is performed into each of the first, second, and third level caches by the prefetch engine. Prefetch requests from the prefetch engine to the second and third level caches is performed over a private prefetch request bus, which is separate from the bus system that transfers data from the various cache levels to the processor.
摘要:
A method and apparatus for mapping some software prefetch instructions in a microprocessor system to a modified set of hardware prefetch instructions and executing the software prefetch by invoking the corresponding modified hardware prefetch instruction. For common software prefetch access patterns, by mapping the software prefetches to hardware, improved prefetching can be achieved without the need for additional hardware.
摘要:
Within a data processing system implementing primary and secondary caches and stream filters and buffers, prefetching of cache lines is performed in a progressive manner. In one mode, data may not be prefetched. In a second mode, two cache lines are prefetched wherein one line is prefetched into the L1 cache and the next line is prefetched into a stream buffer. In a third mode, more than two cache lines are prefetched at a time. Prefetching may be performed on cache misses or hits. Cache misses on successive cache lines may allocate a stream of cache lines to the stream buffers. Control circuitry, coupled to a stream filter circuit, selectively controls fetching and prefetching of data from system memory to the primary and secondary caches associated with a processor and to a stream buffer circuit.
摘要:
A data processing system includes a processor, a unit that includes a multi-level cache, a prefetch system and a memory. The data processing system can operate in a first mode and a second mode. The prefetch system can change behavior in response to a desired power consumption policy set by an external agent or automatically via hardware based on on-chip power/performance thresholds.
摘要:
A pipeline processor has circuits to detect the presence of a register access instruction in an issue stage of the pipeline. A load-miss occurring at a later stage may cause the register access instruction to be marked with an associated bit. The register access instruction progresses down the pipeline and when the flush stage is reached, the processor checks the associated bit and flushes the register access instruction.
摘要:
A method, an apparatus, and a computer program product are provided for detecting load/store dependency in a memory system by dynamically changing the address width for comparison. An incoming load/store operation must be compared to the operations in the pipeline and the queues to avoid address conflicts. Overall, the present invention introduces a cache hit or cache miss input into the load/store dependency logic. If the incoming load operation is a cache hit, then the quadword boundary address value is used for detection. If the incoming load operation is a cache miss, then the cacheline boundary address value is used for detection. This invention enhances the performance of LHS and LHR operations in a memory system.
摘要:
A method is disclosed for executing a load instruction. Address information of the load instruction is used to generate an address of needed data, and the address is used to search a cache memory for the needed data. If the needed data is found in the cache memory, a cache hit signal is generated. At least a portion of the address is used to search a queue for a previous load instruction specifying the same address. If a previous load instruction specifying the same address is found, the cache hit signal is ignored and the load instruction is stored in the queue. A load/store unit, and a processor implementing the method, are also described.