摘要:
A processor (e.g., a co-processor) comprising a decoder coupled to a pre-decoder, in which the decoder decodes a current instruction in parallel with the pre-decoder pre-decoding a subsequent instruction. In particular, the pre-decoder examines at least five Bytecodes in parallel with the decoder decoding a current instruction. The pre-decoder determines if a subsequent instruction contains a prefix. If a prefix is detected in at least one of the five Bytecodes, a program counter skips the prefix and changes the behavior of the decoder during the decoding of the subsequent instruction.
摘要:
In some embodiments, a storage medium comprises application software that performs one or more operations and that directly manages a device. The application software comprises instructions that initialize an application data structure (e.g., an object or array) usable by the application software to manage the device and also comprises instructions that map the application data structure to a memory associated with the device without the use of a device driver. In other embodiments, a method comprises initializing an application data structure to manage a hardware device and mapping the application data structure to a memory associated with the hardware device without the use of a device driver. The application data structure may store a single dimensional data structure or a multi-dimensional data structure. In some embodiments, the device being managed by the application software may comprise a display and the application software may comprise Java code.
摘要:
A technique comprises receiving an instruction and dynamically changing the instruction's semantic based on programmable information that is separate from the instruction. The change in semantic may comprise the inclusion of monitoring code that determines a performance characteristic associated with the instruction or a change in the instruction's operation (e.g., the inclusion of read or write barrier operations to support a garbage collector).
摘要:
A cache architecture (16) for use in a processing includes a RAM set cache for caching a contiguous block of main memory (20). The RAM set cache can be used in conjunction with other cache types, such as a set associative cache or a direct mapped cache. A register (32) defines a starting address for the contiguous block of main memory (20). The data array (38) associated with the RAM set may be filled on a line-by-line basis, as lines are requested by the processing core, or on a set-fill basis which fills the data array (38) when the starting address is loaded into the register (32). As addresses are received from the processing core, hit/miss logic (46) the starting address register (32), a global valid bit (34), line valid bits (37) and control bits (24, 26) are used to determine whether the data is present in the RAM set or whether the data must be loaded from main memory (20). The hit/miss logic (46) also determines whether a line should be loaded into the RAM set data array (38) or in the associated cache.
摘要:
A digital system is provided with a several processors, a private level one (L1) cache associated with each processor, a shared level two (L2) cache having several segments per entry, and a level three (L3) physical memory. The shared L2 cache architecture is embodied with 4-way associativity, four segments per entry and four valid and dirty bits. When the L2-cache misses, the penalty to access to data within the L3 memory is high. The system supports miss under miss to let a second miss interrupt a segment prefetch being done in response to a first miss. Thus, an interruptible SDRAM to L2-cache prefetch system with miss under miss support is provided. A shared translation look-aside buffer (TLB) is provided for L2 accesses, while a private TLB is associated with each processor. A micro TLB (&mgr;TLB) is associated with each resource that can initiate a memory transfer. The L2 cache, along with all of the TLBs and &mgr;TLBs have resource ID fields and task ID fields associated with each entry to allow flushing and cleaning based on resource or task. Configuration circuitry is provided to allow the digital system to be configured on a task by task basis in order to reduce power consumption.
摘要:
A digital system is provided with a several processors, a private level one (L1) cache associated with each processor, a shared level two (L2) cache having several segments per entry, and a level three (L3) physical memory. The shared L2 cache architecture is embodied with 4-way associativity, four segments per entry and four valid and dirty bits. Multiple detection circuitry responds to several cache access requests concurrently. Multiple ports in the cache service multiple requesters concurrently if concurrent hits are determined by the detection circuitry.
摘要:
A processor may execute a test and skip instruction that includes or otherwise specifies at least two operands that are used in a comparison operation. Based on the results of the comparison, the instruction that follows the test and skip instruction is “skipped.” The test and skip instruction may specify that the operands used in the comparison include (1) the contents of two registers, (2) the contents of one register and the contents of a memory location, or (3) the contents of one register and a stack value. In the second mode (an operand being from memory), a register is specified in the test and skip instruction that contains a value from which a pointer may be calculated. The calculated pointer preferably points to the memory location. If a stack value is used in the execution of the test and skip instruction, the instruction may include a reference to a register that points to the top of the stack. Further, the stack pointer may be adjusted automatically if the stack is used to provide an operand for the instruction. Embodiments may include apparatus and methods.
摘要:
Methods and apparatuses are disclosed for implementing a processor with a split stack. In some embodiments, the processor includes a main stack and a micro-stack. The micro-stack preferably is implemented in the core of the processor, whereas the main stack may be implemented in areas that are external to the core of the processor. Operands are preferably provided to an arithmetic logic unit (ALU) by the micro-stack, and in the case of underflow (micro-stack empty), operands may be fetched from the main stack. Operands are written to the main stack during overflow (micro-stack full) or by explicit flushing of the micro-stack. By optimizing the size of the micro-stack, the number of operands fetched from the main stack may be reduced, and consequently the processor's power consumption may be reduced.
摘要:
A digital system and method of operation is provided in which a method is provided for cleaning a range of addresses in a storage region specified by a start parameter and an end parameter. An interruptible clean instruction (802) can be executed in a sequence of instructions (800) in accordance with a program counter. If an interrupt (804) is received during execution of the clean instruction, execution of the clean instruction is suspended before it is completed. After performing a context switch (810), the interrupt is serviced (820). Upon returning from the interrupt service routine (830, 834), execution of the clean instruction is resumed by comparing the start parameter and the end parameter provided by the clean instruction with a current content of a respective start register and end register used during execution of the clean instruction. If the same, execution of the clean instruction is resumed using the current content of the start register and end register. If different, execution of the clean instruction is restarted by storing the start parameter provided by clean instruction in the start register and by storing the end parameter in the end register. In this manner, no additional context information needs to be saved during a context switch in order to allow the clean instruction to be interruptible. If the interrupt occurred during a non-interruptible instruction, then the instruction is completed before the context switch and a return (830, 832) after the interrupt service routine begins execution at the next instruction (803). Other instructions that perform a sequence of operations can also be made interruptible in a similar manner.
摘要:
A digital system is provided with a memory (506) shared by several initiator resources (540-550), wherein a portion of the initiator resources are big endian and another portion of the initiator resources are little endian. The memory is segregated into a set of regions by a memory management unit (MMU) (500-510) and an endianism attribute bit is defined for each region. For each memory request to the memory, the endianism attribute bit for the selected region is provided by the MMU. Each memory transaction request is completed in accordance with the endianism attribute of the selected region. Depending on the capability of a given initiator resource, the memory request address is adjusted to agree with the endianism attribute of the selected region, or an access fault is generated (530) if the endianism of the initiating resource does not match the endianism attribute of the selected memory region. A resource identification value (R-ID) provided by each of the initiator resources is used to identify the endianism of each of the initiator resources.