摘要:
A processor (e.g., a co-processor) comprising a decoder coupled to a pre-decoder, in which the decoder decodes a current instruction in parallel with the pre-decoder pre-decoding a subsequent instruction. In particular, the pre-decoder examines at least five Bytecodes in parallel with the decoder decoding a current instruction. The pre-decoder determines if a subsequent instruction contains a prefix. If a prefix is detected in at least one of the five Bytecodes, a program counter skips the prefix and changes the behavior of the decoder during the decoding of the subsequent instruction.
摘要:
A cache architecture (16) for use in a processing includes a RAM set cache for caching a contiguous block of main memory (20). The RAM set cache can be used in conjunction with other cache types, such as a set associative cache or a direct mapped cache. A register (32) defines a starting address for the contiguous block of main memory (20). The data array (38) associated with the RAM set may be filled on a line-by-line basis, as lines are requested by the processing core, or on a set-fill basis which fills the data array (38) when the starting address is loaded into the register (32). As addresses are received from the processing core, hit/miss logic (46) the starting address register (32), a global valid bit (34), line valid bits (37) and control bits (24, 26) are used to determine whether the data is present in the RAM set or whether the data must be loaded from main memory (20). The hit/miss logic (46) also determines whether a line should be loaded into the RAM set data array (38) or in the associated cache.
摘要:
A digital system is provided with a several processors, a private level one (L1) cache associated with each processor, a shared level two (L2) cache having several segments per entry, and a level three (L3) physical memory. The shared L2 cache architecture is embodied with 4-way associativity, four segments per entry and four valid and dirty bits. Multiple detection circuitry responds to several cache access requests concurrently. Multiple ports in the cache service multiple requesters concurrently if concurrent hits are determined by the detection circuitry.
摘要:
Methods and apparatuses are disclosed for implementing a processor with a split stack. In some embodiments, the processor includes a main stack and a micro-stack. The micro-stack preferably is implemented in the core of the processor, whereas the main stack may be implemented in areas that are external to the core of the processor. Operands are preferably provided to an arithmetic logic unit (ALU) by the micro-stack, and in the case of underflow (micro-stack empty), operands may be fetched from the main stack. Operands are written to the main stack during overflow (micro-stack full) or by explicit flushing of the micro-stack. By optimizing the size of the micro-stack, the number of operands fetched from the main stack may be reduced, and consequently the processor's power consumption may be reduced.
摘要:
A digital system and method of operation is provided in which the digital system has at least one processor, with an associated multi-segment cache memory circuit (506(n). Validity circuitry (VI) is connected to the memory circuit and is operable to indicate if each segment of the plurality of segments holds valid data. Dirty bit circuitry (DI) is connected to the memory circuit for indicating if data within the cache is incoherent with a secondary back-up memory. DMA circuitry can transfer (1652) blocks of data/instructions (1660) between the cache and a secondary memory (1602). A transfer mode circuit (1681) controls how DMA operations are affected by the dirty bits. If the transfer mode circuit is in a first mode, a DMA operation transfers only segments (1661) indicated as dirty (1685). If the transfer mode circuit is in a second mode, a DMA operation transfers and entire block of data (1660) without regard to dirty indicators (1686). DMA transfers from the cache to secondary memory are thereby configured to be responsive to the dirty bits. A dirty bit mode circuit (1680) controls how DMA transfers affect the dirty bits. When the mode circuit is in a first mode, DMA transfers set the affected dirty bits to a clean state. When the dirty bit mode circuitry is in an alternate mode, DMA transfers set the affected dirty bits to a dirty state. A cache clean operation will thus copy data provided by a DMA transfer and indicated as dirty into backup secondary memory.
摘要:
A digital system and method of operation is provided in which the digital system has at least one processor, with an associated multi-segment cache memory circuit (1806(n). Validity circuitry (VI) and dirty bit circuitry (DI) is connected to the memory circuit and is operable to indicate if each segment of the plurality of segments holds valid data. Block circuitry (700, 702) is connected to the set of valid bits and dirty bits and is operable to invalidate a selected range of lines in response to a directive from the first processor. The block circuitry has a start register (700) and an end register (702) each separately loadable by the processor. The block circuitry can invalidate either a single line or a block of lines in response to an operation command from the processor, depending on whether the end register is loaded or not. Likewise, the block circuitry can clean a single line or a block of lines in response to an operation command from the processor.
摘要:
An on-screen display system in which a CPU generates windows in a working memory space also provides for real time calculation of window addresses in the working memory space. This can eliminate the need for a separate frame buffer memory.
摘要:
A multi-processor system 8 includes multiple processing devices, including DSPs (10), processor units (MPUs) (21), co-processors (30) and DMA channels (31). Some of the devices may include internal MMUs (19, 32) which allows the device (10, 21, 30, 31) to work with a large virtual address space mapped to an external shared memory (20). The MMUs (19, 32) may perform the translation between a virtual address and the physical address associated with the external shared memory (20). Access to the shared memory (20) is controlled using a unified memory management system.
摘要:
A processor (e.g., a co-processor) comprising a decoder adapted to decode instructions from a first instruction set in a first mode and a second instruction set in a second mode. A pre-decoder coupled to the decoder, and operates in parallel with the decoder, determines the mode of operation for the decode logic for subsequent instructions. In particular, the decode logic operating in a current mode concurrently with the pre-decoder detecting a predetermined prefix, which indicates a subsequent instruction is a system command. Upon detecting this predetermined prefix, the decoder decodes the system command accordingly.
摘要:
A digital system has at least one processor, with an associated multi-segment cache memory circuit. A single global validity circuit (VIG) is connected to the memory circuit and is operable to indicate if any segment of the multiple segments holds valid data. Block circuitry is operable to transfer data from a pre-selected region of the secondary memory to a particular segment of the plurality of segments and to assert the global valid bit at the completion of a block transfer. Direct memory access (DMA) circuitry is connected to the memory cache for transferring data between the memory cache and a selectable region of a secondary memory and is also operable to assert the global valid bit at the completion of a DMA block transfer.