摘要:
Embodiments of the invention provide methods and apparatus for selectively bypassing cache levels when processing non-reusable transient data in a cache coherent system. To selectively bypass cache levels a page table entry (PTE) mechanism may be employed. To limit the number of PTE bits, the PTE may have a 2-bit “bypass type” field among other attribute bits that index which bits of a Special Purpose Register (SPR) identify the cache levels to be bypassed.
摘要:
Embodiments of the invention provide methods and apparatus for selectively bypassing cache levels when processing non-reusable transient data in a cache coherent system. To selectively bypass cache levels a page table entry (PTE) mechanism may be employed. To limit the number of PTE bits, the PTE may have a 2-bit “bypass type” field among other attribute bits that index which bits of a Special Purpose Register (SPR) identify the cache levels to be bypassed.
摘要:
An enhanced mechanism for loading entries into a translation lookaside buffer (TLB) in hardware via indirect TLB entries. In one embodiment, if no direct TLB entry associated with the given virtual address is found in the TLB, the TLB is checked for an indirect TLB entry associated with the given virtual address. Each indirect TLB entry provides the real address of a page table associated with a specified range of virtual addresses and comprises an array of page table entries. If an indirect TLB entry associated with the given virtual address is found in the TLB, a computed address is generated by combining a real address field from the indirect TLB entry and bits from the given virtual address, a page table entry (PTE) is obtained by reading a word from a memory at the computed address, and the PTE is loaded into the TLB as a direct TLB entry.
摘要:
An enhanced mechanism for loading entries into a translation lookaside buffer (TLB) in hardware via indirect TLB entries. In one embodiment, if no direct TLB entry associated with the given virtual address is found in the TLB, the TLB is checked for an indirect TLB entry associated with the given virtual address. Each indirect TLB entry provides the real address of a page table associated with a specified range of virtual addresses and comprises an array of page table entries. If an indirect TLB entry associated with the given virtual address is found in the TLB, a computed address is generated by combining a real address field from the indirect TLB entry and bits from the given virtual address, a page table entry (PTE) is obtained by reading a word from a memory at the computed address, and the PTE is loaded into the TLB as a direct TLB entry.
摘要:
Embodiments of the invention are directed to optimizing the performance of a split disk cache. In one embodiment, a disk cache includes a primary region having a read portion and write portion and one or more smaller, sample regions also including a read portion and a write portion. The primary region and one or more sample region each have an independently adjustable ratio of a read portion to a write portion. Cached reads are distributed among the read portions of the primary and sample region, while cached writes are distributed among the write portions of the primary and sample region. The performance of the primary region and the performance of the sample region are tracked, such as by obtaining a hit rate for each region during a predefined interval. The read/write ratio of the primary region is then selectively adjusted according to the performance of the one or more sample regions.
摘要:
A memory hierarchy in a computer includes levels of cache. The computer also includes a processor operatively coupled through two or more levels of cache to a main random access memory. Caches closer to the processor in the hierarchy are characterized as higher in the hierarchy. Memory management among the levels of cache includes identifying a line in a first cache that is preferably retained in the first cache, where the first cache is backed up by at least one cache lower in the memory hierarchy and the lower cache implements an LRU-type cache line replacement policy. Memory management also includes updating LRU information for the lower cache to indicate that the line has been recently accessed.
摘要:
A pattern matching accelerator (PMA) for assisting software threads to find the presence and location of strings in an input data stream that match a given pattern. The patterns are defined using regular expressions that are compiled into a data structure comprised of rules subsequently processed by the PMA. The patterns to be searched in the input stream are defined by the user as a set of regular expressions. The patterns to be searched are grouped in pattern context sets. The sets of regular expressions which define the pattern context sets are compiled to generate a rules structure used by the PMA hardware. The rules are compiled before search run time and stored in main memory, in rule cache memory within the PMA or a combination thereof. For each input character, the PMA executes the search and returns the search results.
摘要:
A pattern matching accelerator (PMA) for assisting software threads to find the presence and location of strings in an input data stream that match a given pattern. The patterns are defined using regular expressions that are compiled into a data structure comprised of rules subsequently processed by the PMA. The patterns to be searched in the input stream are defined by the user as a set of regular expressions. The patterns to be searched are grouped in pattern context sets. The sets of regular expressions which define the pattern context sets are compiled to generate a rules structure used by the PMA hardware. The rules are compiled before search run time and stored in main memory, in rule cache memory within the PMA or a combination thereof. For each input character, the PMA executes the search and returns the search results.
摘要:
A pattern matching accelerator (PMA) for assisting software threads to find the presence and location of strings in an input data stream that match a given pattern. The patterns are defined using regular expressions that are compiled into a data structure comprised of rules subsequently processed by the PMA. The patterns to be searched in the input stream are defined by the user as a set of regular expressions. The patterns to be searched are grouped in pattern context sets. The sets of regular expressions which define the pattern context sets are compiled to generate a rules structure used by the PMA hardware. The rules are compiled before search run time and stored in main memory, in rule cache memory within the PMA or a combination thereof. For each input character, the PMA executes the search and returns the search results.
摘要:
Embodiments of the invention are directed to optimizing the performance of a split disk cache. In one embodiment, a disk cache includes a primary region having a read portion and write portion and one or more smaller, sample regions also including a read portion and a write portion. The primary region and one or more sample region each have an independently adjustable ratio of a read portion to a write portion. Cached reads are distributed among the read portions of the primary and sample region, while cached writes are distributed among the write portions of the primary and sample region. The performance of the primary region and the performance of the sample region are tracked, such as by obtaining a hit rate for each region during a predefined interval. The read/write ratio of the primary region is then selectively adjusted according to the performance of the one or more sample regions.