摘要:
A cache architecture (16) for use in a processing device includes a RAM set cache for caching a contiguous block of main memory (20). The RAM set cache can be used in conjunction with other cache types, such as a set associative cache or a direct mapped cache. A register (32) define a starting address for the contiguous block of main memory (20). The data array (38) associated with the RAM set may be filled on a line by line basis, as lines are requested by the processing core, or on a set-fill basis which fills the data array (38) when the starting address is loaded into the register (32). As addresses are received from the processing core hit miss logic (46) the starting address register (32), a global valid bit (34), line valid bits (37) and control bits (24, 26) are used to determine whether the data is present in the RAM set or whether the data must be loaded from main memory (20). The hit/miss logic (46) also determines whether a line should be loaded into the RAM set data array (38) or in the associated cache.
摘要:
A digital system is provided with a several processors, a private level one (L1) cache associated with each processor, a shared level two (L2) cache having several segments per entry, and a level three (L3) physical memory. The shared L2 cache architecture is embodied with 4-way associativity with corresponding tag arrays (502(n)), four segments per entry and four valid and dirty bits. Each tag entry (1236) includes task-ID qualifier field (522) and a resource ID qualifier field (520). Data is loaded into various of lines (506) in the cache in response to cache access requests when a given cache access request misses. After loading data into the cache in response to a miss, a tag (1236) associated with the data line is set to a valid state (526). In addition to setting a tag to a valid state, qualifier values are stored in qualifier fields (520, 522) in the tag. Each qualifier value specifies a usage characteristic of data stored in an associated data line of the cache. In response to an operation command (1251), each tag in the array of tags that contains a specified qualifier value is modified (1258) in accordance with the operation command. Various types of operation commands can be included in an embodiment of the invention, such as clean, flush, clean-flush, lock, and unlock, for example.
摘要:
A multiprocessor system (20, 102, 110) uses multiple operating systems or a single operating system uses &mgr;TLBs (36) and a shared TLB subsystem (48) to provide efficient and flexible translation of virtual addresses to physical addresses. Upon misses in the &mgr;TLB and shared TLB, access to a translation table in external memory (54) can be made using either a hardware mechanism (100) or a software function. The translation can be flexibly based on a number of criteria, such as a resource identifier and a task identifier. Slave processors, such as coprocessors (34) and DMA processors (24) can access the shared TLB 48 without master processor interaction for more efficient operation.
摘要:
A multiprocessor system (20, 102, 110) uses multiple operating systems or a single operating system uses &mgr;TLBs (36) and a shared TLB subsystem (48) to provide efficient and flexible translation of virtual addresses to physical addresses. Upon misses in the &mgr;TLB and shared TLB, access to a translation table in external memory (54) can be made using either a hardware mechanism (100) or a software function. The translation can be flexibly based on a number of criteria, such as a resource identifier and a task identifier. Slave processors, such as coprocessors (34) and DMA processors (24) can access the shared TLB 48 without master processor interaction for more efficient operation.
摘要:
A digital system and method of operation is provided in which several processors (1400, 1402, 1404) are connected to a shared resource (1432). Each processor has an access priority register (1410) that is loaded with an access priority value by software executing on the processor. Arbitration circuitry (1430) is connected to receive a request signal from each processor along with the access priority value from each access priority register. The arbitration circuitry is operable to schedule access to the shared resource according to the access priority values provided by the processors. A software priority state is established during execution of an instruction module on each of the several processors. An instruction is executed on each processor to form an access request to the shared resource. An access priority value is provided with each access request that is responsive to the software priority state of the respective processor. The sequence of instructions is part of a task and the software state is established by defining a task priority for the task and setting the software state in accordance with the task priority. The software priority state is saved during a context switch.
摘要:
A VIVT (virtual index, virtual tag) cache (18) uses an interruptible hardware clean function to clean dirty entries in the cache during a context switch. A MAX counter (82) and a MIN register (84) define a range of cache locations which are dirty. During the hardware clean function, the MAX counter (82) counts downward while cache entries at the address given by the MAX counter (82) are written to main memory (16) if the entry is marked as dirty. If an interrupt occurs, the MAX counter is disabled until a subsequent clean request is issued after the interrupt is serviced.
摘要:
Methods and apparatuses are disclosed for managing memory write back. In some embodiments, the method may include examining current and future instructions operating on a stack that exists in memory, determining stack trend information from the instructions, and utilizing the trend information to reduce data traffic between various levels of the memory. As stacked data are written to a cache line in a first level of memory, if future instructions indicate that additional cache lines are required for subsequent write operations within the stack, then the cache line may be written back to a second level of memory. If however, the future instructions indicate that no additional cache lines are required for subsequent write operations within the stack, then the first level of memory may avoid writing back the cache line and also may keep it marked as dirty.
摘要:
A mobile device (10) manages tasks (18) using a scheduler (20) for scheduling tasks on multiple processors (12). To conserve energy, the set of tasks to be scheduled are divided into two (or more) subsets, which are scheduled according to different procedures. In a specific embodiment, the first subset contains tasks with the highest energy consumption deviation based on the processor that executes the task. This subset is scheduled according to a power-aware procedure for scheduling tasks primarily based on energy consumption criteria. If there is no failure, the second subset is scheduled according to a real-time constrained procedure that schedules tasks primarily based on the deadlines associated with the various tasks in the second subset. If there is a failure in either procedure, one or more tasks with the lowest energy consumption deviation are moved from the first subset to the second subset and the scheduling is repeated.
摘要:
A method for generating program code for translating high level code into instructions for one of a plurality of target processors comprises first determining a desired program code characteristic corresponding to a target processor. Then, selecting one or more predefined program code modules from a plurality of available program code modules in accordance with said desired program code characteristic, and generating program code for translating high level code into instructions for said target processor from said selected one or more predefined program code modules. Preferably, the method comprises forming agglomerated program code from a plurality of program code modules in accordance with said desired program code characteristic.
摘要:
A processor (e.g., a co-processor) executes a stack-based instruction set and another instruction in a way that accelerates the execution of the stack-based instruction set, although code acceleration is not required under the scope of this disclosure. In accordance with at least some embodiments of the invention, the processor may comprise a multi-entry stack usable in at least a stack-based instruction set, logic coupled to and managing the stack, and a plurality of registers coupled to the logic and addressable through a second instruction set that provides register-based and memory-based operations.