摘要:
In a multi-tasking computing system environment, one program is halted and context switched out so that a processor may context switch in a subsequent program for execution. Processor state information exists which reflects the state of the program being context switched out. Storage of this processor state information permits successful resumption of the context switched out program. When the context switched out program is subsequently context switched in, the stored processor information is loaded in preparation for successfully resuming the program at the point in which execution was previously halted. Although, large areas of memory can be allocated to processor state information storage, only a portion of this may need to be preserved across a context switch for successfully saving and resuming the context switched out program. Unnecessarily saving and loading all available processor state information can be noticeably inefficient particularly where relatively large amounts of processor state information exists. In one embodiment, a processor requests a co-processor to context switch out the currently executing program. At a predetermined appropriate point in the executing program, the co-processor responds by halting program execution and saving only the minimal amount of processor state information necessary for successful restoration of the program. The appropriate point is chosen by the application programmer at a location in the executing program that requires preserving a minimal portion of the processor information across a context switch. By saving only a minimal amount of processor information, processor time savings are accumulated across context save and restoration operations.
摘要:
A N-byte vector processor is provided which can emulate 2N-byte processor operations by executing two N-byte operations sequentially. By using N-byte architecture to process 2N-byte wide data, chip size and costs are reduced. One embodiment allows 64-byte operations to be implemented with a 32-byte vector processor by executing a 32-byte instruction on the first 32-bytes of data and then executing a 32-byte instruction on the second 32-bytes of data. Registers and instructions for 64-byte operation are emulated using two 32-byte registers and instructions, respectively, with some instructions requiring modification to accommodate 64-byte operations between adjacent elements, operations requiring specific element locations, operations shifting elements in and out of registers, and operations specifying addresses exceeding 32 bytes.
摘要:
A superscalar processor includes an execution unit that executes load/store instructions and an execution unit that executes arithmetic instruction. Execution pipelines for both execution units include a decode stage, a read stage that identify and read source operands for the instructions and an execution stage or stages performed in the execution units. For store instructions, reading store data from a register file is deferred until the store data is required for transfer to a memory system. This allows the store instructions to be decoded simultaneously with earlier instructions that generate the store data. A simple antidependency interlock uses a list of the register numbers identifying registers holding store data for pending store instructions. These register number are compared to the register numbers of destination operands of instructions, and instructions having destination operands matching a source of store data are stalled in the read stage to prevent the instruction from destroying store data before an earlier store instruction is complete.
摘要:
The present invention provides a method and system for using hardware to assist software in emulating the guest instructions. The method and system comprises an emulation assist unit (EAU) which efficiently maps a guest instruction to a unique tag, an index, and an address of the corresponding semantic routine. The index determines where in a cache a plurality of tags are stored. A separate cache within the EAU stores each tag in association with the address the first time the corresponding guest instruction is emulated. Thus, the emulation assist unit also dynamically responds to the set of guest instructions being emulated. The first time a guest instruction is emulated, the EAU determines the address and stores the address in the cache in association with the tag. When the guest instruction is emulated again, the EAU uses the tag to access the stored addresses of the corresponding semantic routine.