Abstract:
One or more architected registers in a processor are fractional-word writable, and data from plural misaligned memory access operations are assembled directly in an architected register, without first assembling the data in a fractional-word writable, non-architected register and then transferring it to the architected register. In embodiments where a general-purpose register file utilizes register renaming or a reorder buffer, data from plural misaligned memory access operations are assembled directly in a fractional-word writable architected register, without the need to fully exception check both misaligned memory access operations before performing the first memory access operation.
Abstract:
Data from a source domain (311 ) operating at a first data rate is transferred to a FIFO (319) in another domain (313) operating at a different data rate. The FIFO (319) buffers data before transfer to a sink for further processing or storage. A source side counter (325) tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter (325) decrements in response to a data ready signal from the source domain (311)1 without delay. The counter (325) increments in response to signaling from the sink domain (313) of a read of data off the FIFO (319). Hence, incrementing is subject to the signaling latency between domains. The source (315) may send one more beat of data when the counter (325) indicates the FIFO (319) is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one o more FIFO positions.
Abstract:
In an instruction execution pipeline, the misalignment of memory access instructions is predicted. Based on the prediction, an additional micro-operation is generated in the pipeline prior to the effective address generation of the memory access instruction. The additional micro-operation accesses the memory falling across a predetermined address boundary. Predicting the misalignment and generating a micro-operation early in the pipeline ensures that sufficient pipeline control resources are available to generate and track the additional micro-operation, avoiding a pipeline flush if the resources are not available at the time of effective address generation. The misalignment prediction may employ known conditional branch prediction techniques, such as a flag, a bimodal counter, a local predictor, a global predictor, and combined predictors. A misalignment predictor may be enabled or biased by a memory access instruction flag or misaligned instruction type.
Abstract:
Data from a source domain operating at a first data rate is transferred to a FIFO in another domain operating at a different data rate. The FIFO buffers data before transfer to a sink for further processing or storage. A source side counter tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter decrements in response to a data ready signal from the source domain, without delay. The counter increments in response to signaling from the sink domain of a read of data off the FIFO. Hence, incrementing is subject to the signaling latency between domains. The source may send one more beat of data when the counter indicates the FIFO is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one more FIFO position.
Abstract:
In an instruction execution pipeline, the misalignment of memory access instructions is predicted. Based on the prediction, an additional micro-operation is generated in the pipeline prior to the effective address generation of the memory access instruction. The additional micro-operation accesses the memory falling across a predetermined address boundary. Predicting the misalignment and generating a micro-operation early in the pipeline ensures that sufficient pipeline control resources are available to generate and track the additional micro-operation, avoiding a pipeline flush if the resources are not available at the time of effective address generation. The misalignment prediction may employ known conditional branch prediction techniques, such as a flag, a bimodal counter, a local predictor, a global predictor, and combined predictors. A misalignment predictor may be enabled or biased by a memory access instruction flag or misaligned instruction type.
Abstract:
A processor includes a cache memory having at least one entry managed according to a copy-back algorithm. A global modified indicator (GMI) indicates whether any copy-back entry in the cache contains modified data. On a cache miss, if the GMI indicates that no copy-back entry in the cache contains modified data, data fetched from memory are written to the selected entry without first reading the entry. In a banked cache, two or more bank-GMIs may be associated with two or more banks. In an n-way set associative cache, n set-GMIs may be associated with the n sets. Suppressing the read to determine if the copy-back cache entry contains modified data improves processor performance and reduces power consumption.
Abstract:
A processor includes a hierarchical Translation Lookaside Buffer (TLB) comprising a Level-1 TLB and a small, high-speed Level-0 TLB. Entries in the L0 TLB replicate entries in the L1 TLB. The processor first accesses the L0 TLB in an address translation, and access the L1 TLB if a virtual address misses in the L0 TLB. When the virtual address hits in the L1 TLB, the virtual address, physical address, and page attributes are written to the L0 TLB, replacing an existing entry if the L0 TLB is full. The entry may be locked against replacement in the L0 TLB in response to an L0 Lock (L0L) indicator in the L1 TLB entry. Similarly, in a hardware-managed L1 TLB, entries may be locked against replacement in response to an L1 Lock (L1L) indicator in the corresponding page table entry.
Abstract:
A system for optimizing translation lookaside buffer entries is provided. The system includes a translation lookaside buffer configured to store a number of entries, each entry having a size attribute, each entry referencing a corresponding page, and control logic configured to modify the size attribute of an existing entry in the translation lookaside buffer if a new page is contiguous with an existing page referenced by the existing entry. The existing entry after having had its size attribute modified references a consolidated page comprising the existing page and the new page.
Abstract:
One or more architected registers in a processor are fractional-word writable, and data from plural misaligned memory access operations are assembled directly in an architected register, without first assembling the data in a fractional-word writable, non-architected register and then transferring it to the architected register. In embodiments where a general-purpose register file utilizes register renaming or a reorder buffer, data from plural misaligned memory access operations are assembled directly in a fractional-word writable architected register, without the need to fully exception check both misaligned memory access operations before performing the first memory access operation.
Abstract:
An apparatus includes a memory configured to store data, a lower level TLB, an upper level TLB, and a TLB controller. The lower level TLB and the upper level TLB are configured to store a plurality of entries, each of the entries containing an address translation information that allows a virtual address to be translated into a corresponding physical address. The TLB controller retrieves from a page table in the memory an address translation information for a desired virtual address, if the desired virtual address generates a TLB miss from the lower level TLB and from the upper level TLB, Using a single TLB write instruction, the TLB controller updates both the lower level TLB and the upper level TLB by writing the address translation information, retrieved from the page table, into the lower level TLB as well as into the upper level TLB.