摘要:
A technique for performing stream detection and prefetching within a cache memory simplifies stream detection and prefetching. A bit in a cache directory or cache entry indicates that a cache line has not been accessed since being prefetched and another bit indicates the direction of a stream associated with the cache line. A next cache line is prefetched when a previously prefetched cache line is accessed, so that the cache always attempts to prefetch one cache line ahead of accesses, in the direction of a detected stream. Stream detection is performed in response to load misses tracked in the load miss queue (LMQ). The LMQ stores an offset indicating a first miss at the offset within a cache line. A next miss to the line sets a direction bit based on the difference between the first and second offsets and causes prefetch of the next line for the stream.
摘要:
A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.
摘要:
In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.
摘要:
A method, processor, and data processing system for enabling utilization of a single prefetch stream to access data across a memory page boundary. A prefetch engine includes an active streams table in which information for one or more scheduled prefetch streams are stored. The prefetch engine also includes a victim table for storing a previously active stream whose next prefetch crosses a memory page boundary. The scheduling logic issues a prefetch request with a real address to fetch data from the lower level memory. Then, responsive to detecting that the real address of the stream's next sequential prefetch crosses the memory page boundary, the prefetch engine determines when the first prefetch stream can continue across the page boundary of the first memory page (via an effective address comparison). The PE automatically reinserts the first prefetch stream into the active stream table to jump start prefetching across the page boundary.
摘要:
According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core.
摘要:
A method, processor, and data processing system for enabling utilization of a single prefetch stream to access data across a memory page boundary. A prefetch engine includes an active streams table in which information for one or more scheduled prefetch streams are stored. The prefetch engine also includes a victim table for storing a previously active stream whose next prefetch crosses a memory page boundary. The scheduling logic issues a prefetch request with a real address to fetch data from the lower level memory. Then, responsive to detecting that the real address of the stream's next sequential prefetch crosses the memory page boundary, the prefetch engine determines when the first prefetch stream can continue across the page boundary of the first memory page (via an effective address comparison). The PE automatically reinserts the first prefetch stream into the active stream table to jump start prefetching across the page boundary.
摘要:
A technique for performing data prefetching using multi-level indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content that is included in a first data block (e.g., a first cache line of a memory) at the first memory address is then fetched. A second memory address is then determined based on the content at the first memory address. Content that is included in a second data block (e.g., a second cache line) at the second memory address is then fetched (e.g., from the memory or another memory). A third memory address is then determined based on the content at the second memory address. Finally, a third data block (e.g., a third cache line) that includes another pointer or data at the third memory address is fetched (e.g., from the memory or the another memory).
摘要:
A technique for data prefetching using indirect addressing includes monitoring data pointer values, associated with an array, in an access stream to a memory. The technique determines whether a pattern exists in the data pointer values. A prefetch table is then populated with respective entries that correspond to respective array address/data pointer pairs based on a predicted pattern in the data pointer values. Respective data blocks (e.g., respective cache lines) are then prefetched (e.g., from the memory or another memory) based on the respective entries in the prefetch table.
摘要:
A technique for performing data prefetching using indirect addressing includes determining a first memory address of a pointer associated with a data prefetch instruction. Content, that is included in a first data block (e.g., a first cache line) of a memory, at the first memory address is then fetched. An offset is then added to the content of the memory at the first memory address to provide a first offset memory address. A second memory address is then determined based on the first offset memory address. A second data block (e.g., a second cache line) that includes data at the second memory address is then fetched (e.g., from the memory or another memory). A data prefetch instruction may be indicated by a unique operational code (opcode), a unique extended opcode, or a field (including one or more bits) in an instruction.
摘要:
A mechanism for assigning memory to on-chip cache coherence domains assigns caches within a processing unit to coherence domains. The mechanism assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller.