摘要:
A processor having a multistage pipeline includes a TLB and a TLB controller. In response to a TLB miss signal, the TLB controller initiates a TLB reload, requesting address translation information from either a memory or a higher-level TLB, and placing that information into the TLB. The processor flushes the instruction having the missing virtual address, and refetches the instruction, resulting in re-insertion of the instruction at an initial stage of the pipeline above the TLB access point. The initiation of the TLB reload, and the flush/refetch of the instruction, are performed substantially in parallel, and without immediately stalling the pipeline. The refetched instruction is held at a point in the pipeline above the TLB access point until the TLB reload is complete, so that the refetched instruction generates a “hit” in the TLB upon its next access.
摘要:
An apparatus includes a memory configured to store data, a lower level TLB, an upper level TLB, and a TLB controller. The lower level TLB and the upper level TLB are configured to store a plurality of entries, each of the entries containing an address translation information that allows a virtual address to be translated into a corresponding physical address. The TLB controller retrieves from a page table in the memory an address translation information for a desired virtual address, if the desired virtual address generates a TLB miss from the lower level TLB and from the upper level TLB, Using a single TLB write instruction, the TLB controller updates both the lower level TLB and the upper level TLB by writing the address translation information, retrieved from the page table, into the lower level TLB as well as into the upper level TLB.
摘要:
A Block Normal Cache Allocation (BNCA) mode is defined for a processor. In BNCA mode, cache entries may only be allocated by predetermined instructions. Normal memory access instructions (for example, as part of interrupt code) may execute and will retrieve data from main memory in the event of a cache miss; however, these instructions are not allowed to allocate entries in the cache. Only the predetermined instructions (for example, those used to establish locked cache entries) may allocate entries in the cache. When the locked entries are established, the processor exits BNCA mode, and any memory access instruction may allocate cache entries. BNCA mode may be indicated by setting a bit in a configuration register.
摘要:
A processor includes a memory configured to store data in a plurality of pages, a TLB, and a TLB controller. The TLB is configured to search, when accessed by an instruction having a virtual address, for address translation information that allows the virtual address to be translated into a physical address of one of the plurality of pages, and to provide the address translation information if the address translation information is found within the TLB. The TLB controller is configured to determine whether a current instruction and a subsequent instruction seek access to a same page within the plurality of pages, and if so, to prevent TLB access by the subsequent instruction, and to utilize the results of the TLB access of a previous instruction for the current instruction.
摘要:
A system for optimizing translation lookaside buffer entries is provided. The system includes a translation lookaside buffer configured to store a number of entries, each entry having a size attribute, each entry referencing a corresponding page, and control logic configured to modify the size attribute of an existing entry in the translation lookaside buffer if a new page is contiguous with an existing page referenced by the existing entry. The existing entry after having had its size attribute modified references a consolidated page comprising the existing page and the new page.
摘要:
In an instruction execution pipeline, the misalignment of memory access instructions is predicted. Based on the prediction, an additional micro-operation is generated in the pipeline prior to the effective address generation of the memory access instruction. The additional micro-operation accesses the memory falling across a predetermined address boundary. Predicting the misalignment and generating a micro-operation early in the pipeline ensures that sufficient pipeline control resources are available to generate and track the additional micro-operation, avoiding a pipeline flush if the resources are not available at the time of effective address generation. The misalignment prediction may employ known conditional branch prediction techniques, such as a flag, a bimodal counter, a local predictor, a global predictor, and combined predictors. A misalignment predictor may be enabled or biased by a memory access instruction flag or misaligned instruction type.
摘要:
A method of managing cache partitions provides a first pointer for higher priority writes and a second pointer for lower priority writes, and uses the first pointer to delimit the lower priority writes. For example, locked writes have greater priority than unlocked writes, and a first pointer may be used for locked writes, and a second pointer may be used for unlocked writes. The first pointer is advanced responsive to making locked writes, and its advancement thus defines a locked region and an unlocked region. The second pointer is advanced responsive to making unlocked writes. The second pointer also is advanced (or retreated) as needed to prevent it from pointing to locations already traversed by the first pointer. Thus, the pointer delimits the unlocked region and allows the locked region to grow at the expense of the unlocked region.
摘要:
A processor includes a hierarchical Translation Lookaside Buffer (TLB) comprising a Level-1 TLB and a small, high-speed Level-0 TLB. Entries in the L0 TLB replicate entries in the L1 TLB. The processor first accesses the L0 TLB in an address translation, and access the L1 TLB if a virtual address misses in the L0 TLB. When the virtual address hits in the L1 TLB, the virtual address, physical address, and page attributes are written to the L0 TLB, replacing an existing entry if the L0 TLB is full. The entry may be locked against replacement in the L0 TLB in response to an L0 Lock (L0L) indicator in the L1 TLB entry. Similarly, in a hardware-managed L1 TLB, entries may be locked against replacement in response to an L1 Lock (L1L) indicator in the corresponding page table entry.
摘要:
A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss.
摘要:
One or more architected registers in a processor are fractional-word writable, and data from plural misaligned memory access operations are assembled directly in an architected register, without first assembling the data in a fractional-word writable, non-architected register and then transferring it to the architected register. In embodiments where a general-purpose register file utilizes register renaming or a reorder buffer, data from plural misaligned memory access operations are assembled directly in a fractional-word writable architected register, without the need to fully exception check both misaligned memory access operations before performing the first memory access operation.