摘要:
A mechanism for assigning memory to on-chip cache coherence domains assigns caches within a processing unit to coherence domains. The mechanism assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller.
摘要:
A technique for data prefetching using indirect addressing includes monitoring data pointer values, associated with an array, in an access stream to a memory. The technique determines whether a pattern exists in the data pointer values. A prefetch table is then populated with respective entries that correspond to respective array address/data pointer pairs based on a predicted pattern in the data pointer values. Respective data blocks (e.g., respective cache lines) are then prefetched (e.g., from the memory or another memory) based on the respective entries in the prefetch table.
摘要:
A technique for performing data prefetching using indirect addressing includes determining a first memory address of a pointer associated with a data prefetch instruction. Content, that is included in a first data block (e.g., a first cache line) of a memory, at the first memory address is then fetched. An offset is then added to the content of the memory at the first memory address to provide a first offset memory address. A second memory address is then determined based on the first offset memory address. A second data block (e.g., a second cache line) that includes data at the second memory address is then fetched (e.g., from the memory or another memory). A data prefetch instruction may be indicated by a unique operational code (opcode), a unique extended opcode, or a field (including one or more bits) in an instruction.
摘要:
A technique for sharing a fabric to facilitate off-chip communication for on-chip units includes dynamically assigning a first unit that implements a first communication protocol to a first portion of the fabric when private fabrics are indicated for the on-chip units. The technique also includes dynamically assigning a second unit that implements a second communication protocol to a second portion of the fabric when the private fabrics are indicated for the on-chip units. In this case, the first and second units are integrated in a same chip and the first and second protocols are different. The technique further includes dynamically assigning, based on off-chip traffic requirements of the first and second units, the first unit or the second unit to the first and second portions of the fabric when the private fabrics are not indicated for the on-chip units.
摘要:
A mechanism is provided for assigning memory to on-chip cache coherence domains. The mechanism assigns caches within a processing unit to coherence domains. The mechanism then assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller. If a cache line is not found within the coherence domain, the memory controller accesses the memory to retrieve the cache line.
摘要:
A mechanism is provided within a 3D stacked memory organization to spread or stripe cache lines across multiple layers. In an example organization, a 128B cache line takes eight cycles on a 16B-wide bus. Each layer may provide 32B. The first layer uses the first two of the eight transfer cycles to send the first 32B. The next layer sends the next 32B using the next two cycles of the eight transfer cycles, and so forth. The mechanism provides a uniform memory access.
摘要:
Mechanisms are provided for inhibiting precharging of memory cells of a dynamic random access memory (DRAM) structure. The mechanisms receive a command for accessing memory cells of the DRAM structure. The mechanisms further determine, based on the command, if precharging the memory cells following accessing the memory cells is to be inhibited. Moreover, the mechanisms send, in response to the determination indicating that precharging the memory cells is to be inhibited, a command to blocking logic of the DRAM structure to block precharging of the memory cells following accessing the memory cells.
摘要:
A modular processing module is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors coupled to the circuit board, and a plurality of processing nodes coupled to the circuit board. Each processing module side in the set of processing module sides couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system.
摘要:
A method, processor, and data processing system for dynamically adjusting a prefetch stream priority based on the consumption rate of the data by the processor. The method includes a prefetch engine issuing a prefetch request of a first prefetch stream to fetch one or more data from the memory subsystem. The first prefetch stream has a first assigned priority that determines a relative order for scheduling prefetch requests of the first prefetch stream relative to other prefetch requests of other prefetch streams. Based on the receipt of a processor demand for the data before the data returns to the cache or return of the data along time before the receiving the processor demand, logic of the prefetch engine dynamically changes the first assigned priority to a second higher or lower priority, which priority is subsequently utilized to schedule and issue a next prefetch request of the first prefetch stream.
摘要:
A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle.