Abstract:
This invention hides the page miss translation latency for program fetches. In this invention whenever an access is requested by CPU that crosses a memory page boundary, the L1I cache controller request a next page translation along with the current page. This pipelines requests to the μTLB without waiting for L1I cache controller to begin processing the next page requests. This becomes a deterministic prefetch of the second page translation request. The translation information for the second page is stored locally in L1I cache controller and used when the access crosses the next page boundary.
Abstract:
With the increasing demand for improved processor performance, memory systems have been growing increasingly larger to keep up with this performance demand. Caches, which dictate the performance of memory systems are often the focus of improved performance in memory systems, and the most common techniques used to increase cache performance are increased size and associativity. Unfortunately, these methods yield increased static and dynamic power consumption. In this invention, a technique is shown that reduces the power consumption in associative caches with some improvement in cache performance. The architecture shown achieves these power savings by reducing the number of ways queried on each cache access, using a simple hash function and no additional storage, while skipping some pipe stages for improved performance. Up to 90% reduction in power consumption with a 4.6% performance improvement was observed.
Abstract:
This invention provides a current page translation register storing virtual to physical address translation data for a single current page and optionally access permission data for the same page for program accesses. If an accessed address is within the current page, the address translation and permission data is accessed from current page translation register. This current page translation register provides an additional level of caching of this data above the typical translation look-aside buffer and micro translation look-aside buffer. The smaller size of the current page translation register provides faster page hit/miss determination and faster data access using less power than the typical architecture. This is helpful for program access which generally hits the current page more frequently than data access.
Abstract:
This invention hides the page miss translation latency for program fetches. In this invention whenever an access is requested by CPU, the L1I cache controller does a-priori lookup of whether the virtual address plus the fetch packet count of expected program fetches crosses a page boundary. If the access crosses a page boundary, the L1I cache controller will request a second page translation along with the first page. This pipelines requests to the μTLB without waiting for L1I cache controller to begin processing the second page requests. This becomes a deterministic prefetch of the second page translation request. The translation information for the second page is stored locally in L1I cache controller and used when the access crosses the page boundary.
Abstract:
This invention provides a current page translation register storing virtual to physical address translation data for a current page and optionally access permission data for the same page for program accesses. If an accessed address is within the current page, the address translation and permission data is accessed from current page translation register. This current page translation register provides an additional level of caching of this data above the typical translation look-aside buffer and micro translation look-aside buffer. The smaller size of the current page translation register provides faster page hit/miss determination and faster data access using less power than the typical architecture. This is helpful for program access which generally hits the current page more frequently than data access.
Abstract:
A method is shown that eliminates the need for a dedicated reorder buffer register bank or memory space in a multi level cache system. As data requests from the L2 cache may be returned out of order, the L1 cache uses it's cache memory to buffer the out of order data and provides the data to the requesting processor in the correct order from the buffer.
Abstract:
This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetchs the lower half level two cache line employing fewer resources than an ordinary prefetch.
Abstract:
Disclosed embodiments relate to a dNap architecture that accurately transitions cache lines to full power state before an access to them. This ensures that there are no additional delays due to waking up drowsy lines. Only cache lines that are determined by the DMC to be accessed in the immediate future are fully powered while others are put in drowsy mode. As a result, we are able to significantly reduce leakage power with no cache performance degradation and minimal hardware overhead, especially at higher associativities. Up to 92% static/leakage power savings are accomplished with minimal hardware overhead and no performance tradeoff.
Abstract:
Disclosed embodiments relate to a dNap architecture that accurately transitions cache lines to full power state before an access to them. This ensures that there are no additional delays due to waking up drowsy lines. Only cache lines that are determined by the DMC to be accessed in the immediate future are fully powered while others are put in drowsy mode. As a result, we are able to significantly reduce leakage power with no cache performance degradation and minimal hardware overhead, especially at higher associativities. Up to 92% static/leakage power savings are accomplished with minimal hardware overhead and no performance tradeoff.
Abstract:
Disclosed embodiments relate to a dNap architecture that accurately transitions cache lines to full power state before an access to them. This ensures that there are no additional delays due to waking up drowsy lines. Only cache lines that are determined by the DMC to be accessed in the immediate future are fully powered while others are put in drowsy mode. As a result, we are able to significantly reduce leakage power with no cache performance degradation and minimal hardware overhead, especially at higher associativities. Up to 92% static/Leakage power savings are accomplished with minimal hardware overhead and no performance tradeoff.