Abstract:
According to one embodiment, the present disclosure generally provides a method for improving the performance of a cache of a processor. The method may include storing a plurality of data in a data Random Access Memory (RAM). The method may further include holding information for all outstanding requests forwarded to a next-level memory subsystem. The method may also include clearing information associated with a serviced request after the request has been fulfilled. The method may additionally include determining if a subsequent request matches an address supplied to one or more requests already in-flight to the next-level memory subsystem. The method may further include matching fulfilled requests serviced by the next-level memory subsystem to at least one requestor who issued requests while an original request was in-flight to the next level memory subsystem. The method may also include storing information specific to each request, the information including a set attribute and a way attribute, the set and way attributes configured to identify where the returned data should be held in the data RAM once the data is returned, the information specific to each request further including at least one of thread ID, instruction queue position and color. The method may additionally include scheduling hit and miss data returns. Of course, various alternative embodiments are also within the scope of the present disclosure.
Abstract:
A computer system includes a data cache supported by a copy-back buffer and pre- allocation request stack. A programmable trigger mechanism inspects each store operation made by the processor to the data cache to see if a next cache line should be pre-allocated. If the store operation memory address occurs within a range defined by START and END programmable registers, then the next cache line that includes a memory address within that defined by a programmable STRIDE register is requested for pre-allocation. Bunches of pre-allocation requests are organized and scheduled by the pre-allocation request stack, and will take their turns to allow the cache lines being replaced to be processed through the copy-back buffer. By the time the processor gets to doing the store operation in the next cache line, such cache line has already been pre-allocated and there will be a cache hit, thus saving stall cycles.
Abstract:
A device (10, 11) and a method (400, 500) for fetching an information unit, the method (400, 500) includes: receiving (410) a request to execute a write through cacheable operation of the information unit; emptying (440) a fetch unit from data, wherein the fetch unit is connected to a cache module and to a high level memory unit; determining (450), when the fetch unit is empty, whether the cache module stores an older version of the information unit; and selectively writing (460) the information unit to the cache module in response to the cache module in response to the determination.
Abstract:
Techniques for use in CDMA-based products and services, including replacing cache memory allocation so as to maximize residency of a plurality of set ways following a tag-miss allocation. Herein, steps forming a first-in, first-out (FIFO) replacement listing of victim ways for the cache memory, wherein the depth of the FIFO replacement listing approximately equals the number of ways in the cache set. The method and system place a victim way on the FIFO replacement listing only in the event that a tag-miss results in a tag-miss allocation, the victim way is placed at the tail of the FIFO replacement listing after any previously selected victim way. Use of a victim way on the FIFO replacement listing is prevented in the event of an incomplete prior allocation of the victim way by, for example, stalling a reuse request until such initial allocation of the victim way completes or replaying a reuse request until such initial allocation of the victim way completes.
Abstract:
A system (300) and method for performing a simultaneous external read operation during internal programming of a memory device (301) is described. The memory device is configured to store data randomly and includes a source location (305), a destination location (303), a data register (307), and a cache register (309). The data register (307) is configured to simultaneously write data to the destination (303) and to the cache register (309). The system (300) further includes a processing device (107) (e.g., a microprocessor or microcontroller) for verifying an accuracy of any data received through electrical communication with the memory device. The processing device (107) is additionally configured to provide for error correction if the received data are inaccurate, add random data to the data, if required, and then transfer the error-corrected and/or random data modified data back to the destination location (303).
Abstract:
Apparatus (100) and method (500) for providing information to a cache module, the apparatus includes: (i) at least one processor, connected to the cache module (200), for initiating a. first and second requests to retrieve, from the cache module (200), a first and a second data unit; (ii) logic (210), adapted to receive the requests and determine if the first and second data units are mandatory data units; and (iii) a controller (212), connected to the cache module, adapted to initiate a single fetch burst if a memory space retrievable during the single fetch burst comprises the first and second mandatory data units, and adapted to initiate multiple fetch bursts if a memory space retrievable during a single fetch burst does not comprise the fin t and the second mandatory data units.
Abstract:
A system (300) and method for performing a simultaneous external read operation during internal programming of a memory device (301) is described. The memory device is configured to store data randomly and includes a source location (305), a destination location (303), a data register (307), and a cache register (309). The data register (307) is configured to simultaneously write data to the destination (303) and to the cache register (309). The system (300) further includes a processing device (107) (e.g., a microprocessor or microcontroller) for verifying an accuracy of any data received through electrical communication with the memory device. The processing device (107) is additionally configured to provide for error correction if the received data are inaccurate, add random data to the data, if required, and then transfer the error-corrected and/or random data modified data back to the destination location (303).
Abstract:
A dynamic power controller is provided that identifies a clock frequency requirement of a processor and determines a voltage requirement to support the clock frequency requirement. The dynamic power controller transitions the processor to a power state defined by the clock frequency requirement and the voltage requirement. In particular, a voltage level indicated by the voltage requirement is supplied to the processor and the frequency distribution indicated by the frequency requirement is provided to the clocks signals of the processor.
Abstract:
A computer system including a group of CPUs, each having a private cache which communicates with its CPU to receive requests for information blocks and for servicing such requests includes a CPU bus coupled to all the private caches and to a shared cache. Each private cache includes a cache memory and a cache controller having: a processor directory for identifying information blocks resident in the cache memory, logic for identifying cache misses on requests from the CPU, a cache miss output buffer for storing the identifications of a missed block and a block to be moved out of cache memory to make room for the requested block and for selectively sending the identifications onto the CPU bus, a cache miss input buffer stack for storing the identifications of all recently missed blocks and blocks to be swapped from all the CPUs in the group, a comparator for comparing the identifications in the cache miss output buffer stack with the identifications in the cache miss input buffer stack and control logic, responsive to the first comparator sensing a compare (indicating a request by another CPU for the block being swapped), for inhibiting the broadcast of the swap requirement onto the CPU bus and converting the swap operation to a "siphon" operation to service the request of the other CPU.
Abstract:
A memory cache sequencer circuit (10) manages the operation of a memory cache (12, 14, 16, 18) and cache buffer (30) so as to efficiently forward memory contents being delivered to the memory cache (12, 14, 16, 18) via the cache buffer (30), to a multithreading processor (6) awaiting return of those memory contents. The sequencer circuit (10) predicts the location of the memory contents that the processor (6) is awaiting, and speculatively forwards memory contents from either the cache buffer (30) or memory cache (12, 14, 16, 18), while simultaneously verifying that the speculatively forwarded memory contents were correctly forwarded. If the memory contents were incorrectly forwarded, the sequencer circuit (10) issues a signal to the processor (6) receiving the speculatively forwarded memory contents to ignore the forwarded memory contents. This speculative forwarding process may be performed, for example, when a memory access request is received from the processor (6), or whenever memory contents are delivered to the cache buffer (30) after a cache miss. The sequencer circuit (10) includes a plurality of sequencers (50), each storing information for managing the return of data in response to one of the potentially multiple misses and resulting cache linefills which can be generated by the multiple threads being executed by the processor (6).