摘要:
A wide data width processor has an execution unit including an aligner that aligns data for load/store instructions and shifts or rotates data for arithmetic logic instructions. Use of the same circuitry and execution unit for these different types of instructions reduces overall circuit size because alignment circuitry need not be repeated, once in a load/store unit and once in an arithmetic logic unit.
摘要:
A vector processor includes two banks of vector registers where each vector register can stored multiple data elements and a control register with a field indicating a default bank. An instruction set for the vector processor includes instructions which use a register number to identify a vector registers in the default bank, uses a register number to identify a double-size vector register including a register from the first bank and a register from the second bank, and instructions which include a bank bit and a register number to access a vector register from either bank.
摘要:
In a multi-tasking computing system environment, one program is halted and context switched out so that a processor may context switch in a subsequent program for execution. Processor state information exists which reflects the state of the program being context switched out. Storage of this processor state information permits successful resumption of the context switched out program. When the context switched out program is subsequently context switched in, the stored processor information is loaded in preparation for successfully resuming the program at the point in which execution was previously halted. Although, large areas of memory can be allocated to processor state information storage, only a portion of this may need to be preserved across a context switch for successfully saving and resuming the context switched out program. Unnecessarily saving and loading all available processor state information can be noticeably inefficient particularly where relatively large amounts of processor state information exists. In one embodiment, a processor requests a co-processor to context switch out the currently executing program. At a predetermined appropriate point in the executing program, the co-processor responds by halting program execution and saving only the minimal amount of processor state information necessary for successful restoration of the program. The appropriate point is chosen by the application programmer at a location in the executing program that requires preserving a minimal portion of the processor information across a context switch. By saving only a minimal amount of processor information, processor time savings are accumulated across context save and restoration operations.
摘要:
A N-byte vector processor is provided which can emulate 2N-byte processor operations by executing two N-byte operations sequentially. By using N-byte architecture to process 2N-byte wide data, chip size and costs are reduced. One embodiment allows 64-byte operations to be implemented with a 32-byte vector processor by executing a 32-byte instruction on the first 32-bytes of data and then executing a 32-byte instruction on the second 32-bytes of data. Registers and instructions for 64-byte operation are emulated using two 32-byte registers and instructions, respectively, with some instructions requiring modification to accommodate 64-byte operations between adjacent elements, operations requiring specific element locations, operations shifting elements in and out of registers, and operations specifying addresses exceeding 32 bytes.
摘要:
A superscalar processor includes an execution unit that executes load/store instructions and an execution unit that executes arithmetic instruction. Execution pipelines for both execution units include a decode stage, a read stage that identify and read source operands for the instructions and an execution stage or stages performed in the execution units. For store instructions, reading store data from a register file is deferred until the store data is required for transfer to a memory system. This allows the store instructions to be decoded simultaneously with earlier instructions that generate the store data. A simple antidependency interlock uses a list of the register numbers identifying registers holding store data for pending store instructions. These register number are compared to the register numbers of destination operands of instructions, and instructions having destination operands matching a source of store data are stalled in the read stage to prevent the instruction from destroying store data before an earlier store instruction is complete.
摘要:
A virtual core management system including a physical core and a first virtual core including a collection of logical states associated with execution of a first program. The first virtual core is mapped to the physical core. The virtual core management system further includes a second virtual core including a collection of logical states associated with execution of a second program, and a virtual core management component configured to unmap the first virtual core from the physical core and map the second virtual core to the physical core in response to the virtual core management component detecting that the physical core is idle.
摘要:
The present disclosure provides methods and systems adapted for use with a processor having one or more physical cores. The methods and systems include a virtual core management component adapted to map one or more virtual cores to at least one of the physical cores to enable execution of one or more programs by the at least one physical core. The one or more virtual cores include one or more logical states associated with the execution of the one or more programs. The methods and systems may include a memory component adapted to store the one or more virtual cores. The virtual core management component may be adapted to transfer the one or more virtual cores from the memory component to the at least one physical core.
摘要:
A virtual core management system including one or more physical cores and one or more virtual cores. Each virtual core respectively includes a collection of logical states associated with execution of a corresponding program. The virtual core management system further includes one or more interrupt controllers configured to send one or more interrupt signals to interrupt execution of a corresponding program associated with at least one of the one or more virtual cores, and a virtual core management component configured to map the at least one virtual core to one of the one or more physical cores and route the one or more interrupt signals to the corresponding physical core.
摘要:
Software hints embedded in branch instructions direct selection of one of a plurality of branch predictors to use when processing the branch instructions, leading to improved branch prediction (i.e. fewer mis-predictions) over conventional schemes. A software agent assembles branch instructions having associated respective branch predictor control fields compatible with a branch predictor selector and a plurality of branch predictors. Each branch predictor control field is used to perform branch predictor selection, branch predictor control, or both. Branch predictor selection enables selective branch prediction according to an appropriate one of the branch predictors as determined by the software agent by examining context surrounding the branch instruction. Branch predictor control enables control of operation of one or more of the branch predictors. For example, a history-based branch predictor may be instructed to provide branch prediction according to a history-depth specified by the branch predictor control.
摘要:
Managing speculative execution via groups of one or more actions corresponding to atomic traces enables efficient processing of flag-related actions, as atomic traces advantageously enable single checkpoints of flag values at atomic trace boundaries. Checkpointing flags during atomic trace renaming in a processor system uses a flag checkpoint table to store a plurality of flag checkpoints, each corresponding to an atomic trace. The table is selectively accessed to provide flag information to restore speculative flags when an atomic trace is aborted. A corresponding flag checkpoint is stored when an atomic trace is renamed. An action that updates flags updates all entries in the table corresponding to younger atomic traces. If the atomic trace is aborted, then the corresponding flag checkpoint is used for restoration of flag state.