摘要:
A computer system with flash memory in the main memory hierarchy is disclosed. In an embodiment, the computer system includes at least one processor, a memory management unit coupled to the at least one processor, and a random access memory (RAM) coupled to the memory management unit. The computer system may also include a flash memory coupled to the memory management unit, wherein the computer system is configured to store at least a subset of a plurality of pages in the flash memory during operation. Responsive to a page fault, the memory management unit may determine, without invoking an I/O driver, if a requested page associated with the page fault is stored in the flash memory and further configured to, if the page is stored in the flash memory, transfer the page into RAM.
摘要:
A computer system with flash memory in the main memory hierarchy is disclosed. In an embodiment, the computer system includes at least one processor, a memory management unit coupled to the at least one processor, and a random access memory (RAM) coupled to the memory management unit. The computer system may also include a flash memory coupled to the memory management unit, wherein the computer system is configured to store at least a subset of a plurality of pages in the flash memory during operation. Responsive to a page fault, the memory management unit may determine, without invoking an I/O driver, if a requested page associated with the page fault is stored in the flash memory and further configured to, if the page is stored in the flash memory, transfer the page into RAM.
摘要:
System and method for using flash memory in a memory hierarchy. A computer system includes a processor coupled to a memory hierarchy via a memory controller. The memory hierarchy includes a cache memory, a first memory region of random access memory coupled to the memory controller via a first buffer, and an auxiliary memory region of flash memory coupled to the memory controller via a flash controller. The first buffer and the flash controller are coupled to the memory controller via a single interface. The memory controller receives a request to access a particular page in the first memory region. The processor detects a page fault corresponding to the request and in response, invalidates cache lines in the cache memory that correspond to the particular page, flushes the invalid cache lines, and swaps a page from the auxiliary memory region to the first memory region.
摘要:
In one embodiment, a node comprises at least one memory control unit configured to couple to an industry standard memory interface for coupling to a memory; and at least one coherence unit configured to transmit and receive coherence messages to and from other nodes to maintain coherent memory among the nodes. The coherence messages are conveyed on a second interface to which the coherence unit is coupled, wherein the second interface includes at least a physical layer as specified by the industry standard memory interface.
摘要:
An apparatus and method for fine-grained multithreading in a multipipelined processor core. According to one embodiment, a processor may include instruction fetch logic configured to assign a given one of a plurality of threads to a corresponding one of a plurality of thread groups, where each of the plurality of thread groups may comprise a subset of the plurality of threads, to issue a first instruction from one of the plurality of threads during one execution cycle, and to issue a second instruction from another one of the plurality of threads during a successive execution cycle. The processor may further include a plurality of execution units, each configured to execute instructions issued from a respective thread group.
摘要:
In a digital signal processing system, such as a computer system, an apparatus for communicating digital signals in a plurality of operating domains. The first domain has first timing and control signals synchronized to a first clock. In response to an event, the apparatus dynamically transitions the operation of the synchronous memory to a second domain having second timing and control signals synchronized to a second clock. The first timing and control signals being different in frequency, shape, and protocol than the second timing and control signals. The first clock can be a processor clock to synchronize communication of address and data signals with a processor, and the second clock can be a system clock to synchronize communication of address and data signals with an asynchronous data processing device such as random access memory.
摘要:
A data cache unit associated with a processor, the data cache unit including a first non-blocking cache receiving a data access from a device in the processor. A second non-blocking cache is coupled to the first non-blocking cache to service misses in the first non-blocking cache. A data return path coupled to the second non-blocking cache couples data returning from the second non-blocking cache to both the first non-blocking cache and the device generating the access to the first non-blocking cache.
摘要:
A system and method for thermal overload detection and protection for a processor which allows the processor to run at near maximum potential for the vast majority of its execution life. This is effectuated by the provision of circuitry to detect when the processor has exceeded its thermal thresholds and which then causes the processor to automatically reduce the clock rate to a fraction of the nominal clock while execution continues. When the thermal condition has stabilized, the clock may be raised in a stepwise fashion back to the nominal clock rate. Throughout the period of cycling the clock frequency from nominal to minimum and back, the program continues to be executed. Also provided is a queue activity rise time detector and method to control the rate of acceleration of a functional unit from idle to full throttle by a localized stall mechanism at the boundary of each stage in the pipe. This mechanism can detect when an idle queue is suddenly overwhelmed with input such that over a short period of approximately 10-20 machine cycles, the queue activity rate has increased from idle to near stall threshold.
摘要:
A data cache unit associated with a processor, the data cache unit including a multi-ported non-blocking cache receiving a data access request from a lower level device in the processor. A memory scheduling window includes at least one row of entries, wherein each entry includes an address field holding an address of the access request. A conflict map field within at least some of the entries is coupled to a conflict checking unit. The conflict checking unit responds to the address fields by setting bits in the conflict map fields to indicate intra-row conflicts between entries. A picker coupled to the memory scheduling window responds to the conflict map fields so as to identify groups of non-conflicting entries to launch in parallel at the multi-ported non-blocking cache.
摘要:
A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.