摘要:
A novel on-chip cache memory and method of operation are provided which increase microprocessor performance. The cache design allows two cache requests to be processed simultaneously (dual-ported) and concurrent cache requests to be in-flight (pipelined). The design of the cache allocates a first clock cycle to cache tag and data access and a second cycle is allocated to data manipulation. The memory array circuit design is simplified because the circuits are synchronized to the main processor clock and do not need to use self-timed circuits. The overall logic control scheme is simplified because distinct cycles are allocated to the cache functions.
摘要:
A fully-associative translation lookaside buffer structure for a computer system includes a first-level TLB0 memory having a plurality of entries and a second-level TLB1 memory operatively coupled to the first level TLB0 memory. The second-level TLB1 memory also has a plurality of entries. Entries are placed in the TLB0 and TLB1 structure as a result of software controlled translation register operations and hardware controlled translation cache operations. Logic controlling TLB0 treats both operations the same way and uses a hardware replacement algorithm to determine the entry index. Logic controlling TLB1 uses a hardware replacement algorithm to determine the entry index for translation cache entries, and use an index provided within the insertion instruction to determine the entry index for translation register operations.
摘要:
A novel on-chip cache memory and method of operation are provided which increase microprocessor performance. The on-chip cache memory has two levels. The first level is optimized for low latency and the second level is optimized for capacity. Both levels of cache are pipelined and can support simultaneous dual port accesses. A queuing structure is provided between the first and second level of cache which is used to decouple the faster first level cache from the slower second level cache. The queuing structure is also dual ported. Both levels of cache support non-blocking behavior. When there is a cache miss at one level of cache, both caches can continue to process other cache hits and misses. The first level cache is optimized for integer data. The second level cache can store any data type including floating point. The novel two-level cache system of the present invention provides high performance which emphasizes throughput.
摘要:
A NRU algorithm is used to track lines in each region of a memory array such that the corresponding NRU bits are reset on a region-by-region basis. That is, the NRU bits of one region are reset when all of the bits in that region indicate that their corresponding lines have recently been used. Similarly, the NRU bits of another region are reset when all of the bits in that region indicate that their corresponding lines have recently been used. Resetting the NRU bits in one region, however, does not affect the NRU bits in another region. A LRU algorithm is used to track the regions of the array such that each region has a single corresponding entry in a LRU table. That is, all the lines in a single region collectively correspond to a single LRU entry. A region is elevated to most recently used status in the LRU table once the NRU bits of the region are reset.
摘要:
The inventive cache uses a queuing structure which provides out-of-order cache memory access support for multiple accesses, as well as support for managing bank conflicts and address conflicts. The inventive cache can support four data accesses that are hits per clocks, support one access that misses the L1 cache every clock, and support one instruction access every clock. The responses are interspersed in the pipeline, so that conflicts in the queue are minimized. Non-conflicting accesses are not inhibited, however, conflicting accesses are held up until the conflict clears. The inventive cache provides out-of-order support after the retirement stage of a pipeline.
摘要:
A multi-level cache structure and associated method of operating the cache structure are disclosed. The cache structure uses a queue for holding address information for a plurality of memory access requests as a plurality of entries. The queue includes issuing logic for determining which entries should be issued. The issuing logic further comprises find first logic for determining which entries meet a predetermined criteria and selecting a plurality of those entries as issuing entries. The issuing logic also comprises lost logic that delays the issuing of a selected entry for a predetermined time period based upon a delay criteria. The delay criteria may, for example, comprise a conflict between issuing resources, such as ports. Thus, in response to an issuing entry being oversubscribed, the issuing of such entry may be delayed for a predetermined time period (e.g., one clock cycle) to allow the resource conflict to clear.
摘要:
A simplified semaphore method and apparatus for simultaneous execution of multiple semaphore instructions and for enforcement of necessary ordering. A central processing unit having an instruction pipeline is coupled with a data cache arrangement including a semaphore buffer, a data cache, and the semaphore execution unit. An initial semaphore instruction having one or more operands and a semaphore address are transmitted from the instruction pipeline to the semaphore buffer, which in turn are transmitted from the semaphore buffer to the semaphore execution unit. The semaphore address of the initial semaphore instruction is transmitted from the instruction pipeline to the data cache to retrieve initial semaphore data stored within the data cache at a location in a data line of the data cache as identified by the semaphore address. The semaphore instruction is executed within the semaphore execution unit by operating upon the initial semaphore data and the one or more semaphore operands so as to produce processed semaphore data, which is then stored within the data cache. Since the semaphore buffer provides for entries of multiple semaphore instructions, the semaphore buffer initiates simultaneous execution of multiple semaphore instructions, as needed.
摘要:
A method of generating an issue pointer for issuing data structures from a queue, comprising generating a signal that indicates where one or more of the data structures within the queue that desire to issue are located within the queue. Then, checking the signal at a queue location pointed to by an issue pointer. Then, incrementing the position of the issue pointer if a data structure has not shifted into the queue location since the previous issue and if the issue pointer is pointing to the location having issued on the previous queue issue or holding the issue pointer position if a data structure has shifted into the location since the previous issue and if the issue pointer is pointing to the location having issued on the previous queue issue.
摘要:
The inventive cache manages address conflicts and maintains program order without using a store buffer. The cache utilizes an issue algorithm to insure that accesses issued in the same clock are actually issued in an order that is consistent with program order. This is enabled by performing address comparisons prior to insertion of the accesses into the queue. Additionally, when accesses are separated by one or more clocks, address comparisons are performed, and accesses that would get data from the cache memory array before a prior update has actually updated the cache memory in the array are canceled. This provides a guarantee that program order is maintained, as an access is not allowed to complete until it is assured that the most recent data will be received upon access of the array.
摘要:
A system and method are disclosed which determine in parallel for multiple levels of a multi-level cache whether any one of such multiple levels is capable of satisfying a memory access request. Tags for multiple levels of a multi-level cache are accessed in parallel to determine whether the address for a memory access request is contained within any of the multiple levels. For instance, in a preferred embodiment, the tags for the first level of cache and the tags for the second level of cache are accessed in parallel. Also, additional levels of cache tags up to N levels may be accessed in parallel with the first-level cache tags. Thus, by the end of the access of the first-level cache tags it is known whether a memory access request can be satisfied by the first-level, second-level, or any additional N-levels of cache that are accessed in parallel. Additionally, in a preferred embodiment, the multi-level cache is arranged such that the data array of a level of cache is accessed only if it is determined that such level of cache is capable of satisfying a received memory access request. Additionally, in a preferred embodiment the multi-level cache is partitioned into N ways of associativity, and only a single way of a data array is accessed to satisfy a memory access request, thereby preserving the remaining ways of a data array to save power and resources that may be accessed to satisfy other instructions.