摘要:
L'invention concerne un procédé de gestion de l'exécution concomitante, dans un dispositif à circuit intégré destiné à être intégré au sein d'un corps d'objet portatif, d'une part, d'un programme comprenant une ou plusieurs instructions, une des desdites instructions faisant appel à une ou plusieurs et, le cas échéant, à une ou plusieurs sous-routines, et, d'autre part, d'une instruction I x faisant appel à une ou plusieurs routines et, le cas échéant, à une ou plusieurs sous-routines selon lequel procédé, une adresse de la routine ou une adresse de la sous-routine, est enregistrée dans une première mémoire de sauvegarde d'adresses de retour. Selon l'invention, pour l'exécution du programme, une adresse ADD d'une instruction, une adresse d'une routine ou une adresse d'une sous-routine, est enregistrée dans une seconde mémoire de sauvegarde d'adresses de retour, la première mémoire de sauvegarde d'adresses de retour étant différente de la seconde mémoire de sauvegarde d'adresses de retour, et, en ce qu'on exécute une fonction de sauvegarde enregistrant une adresse d'une routine dans la seconde mémoire et réinitialisant la première mémoire de sauvegarde d'adresses de retour. L'invention s'applique, en particulier, aux modules d'identification abonnés SIM.
摘要:
An improved memory model and implementation is disclosed. The memory model includes a Total Store Ordering (TSO) and Partial Store Ordering (PSO) memory model to provide a partial order for the memory operations which are issued by multiple processors. The TSO memory model includes a FIFO Store Buffer for Store, and Atomic Load-Store operations. The Load operations are not placed in the FIFO Store Buffer. The Load operation checks for a value stored in the same location in the FIFO Store Buffer; if no such value is found, then requested value is returned from memory. The PSO model also includes a Store Buffer for Store, and Atomic Load-Store operations. However, unlike the TSO model, the Store Buffer in the PSO model is not FIFO. The processors in the PSO model may issue the Store and Atomic Load-Store in a certain order; however, such operations may be executed by memory out of the order issued by the processors. The execution order is assured only by address matching and the STBAR operation. Two Store operations separated by a STBAR operations guarantees memory will execute the operations in an order issued by the processors. Load operations in the PSO model are not placed in the Store Buffer. The Load operation first checks for a value stored in the same location in the Store Buffer; if no such value is found, then the requested value is returned from memory.
摘要翻译:公开了改进的存储器模型和实现。 存储器模型包括总存储订购(TSO)和部分存储订购(PSO)存储器模型,以提供由多个处理器发出的存储器操作的部分订单。 TSO存储器模型包括用于存储的FIFO存储缓冲区和原子加载存储操作。 加载操作不在FIFO存储缓冲区中。 加载操作检查存储在FIFO存储缓冲区中相同位置的值; 如果没有找到这样的值,则从存储器返回所请求的值。 PSO模型还包括一个Store Store for Store和Atomic Load-Store操作。 但是,与TSO模型不同,PSO模型中的存储缓冲区不是FIFO。 PSO模型中的处理器可以按照特定顺序发布Store和Atomic Load-Store; 然而,这些操作可能会由处理器发出的命令执行。 执行顺序仅通过地址匹配和STBAR操作来保证。 由STBAR操作分隔的两个存储操作保证存储器将按照处理器发出的顺序执行操作。 PSO模型中的加载操作未放置在存储缓冲区中。 加载操作首先检查存储在存储缓冲区中相同位置的值; 如果没有找到这样的值,则从存储器返回所请求的值。
摘要:
Architecture de microprocesseur à mémoire centrale RISC haute performance comprenant une unité de lecture d'instructions permettant de lire des ensembles d'instructions dans une mémoire d'instructions et une unité d'exécution procédant à l'exécution simultanée de plusieurs instructions grâce à une disposition parallèle des unités fonctionnelles. L'unité de lecture conserve un nombre prédéterminé d'instructions dans une mémoire-tampon à instructions. L'unité d'exécution comporte une unité de sélection des instructions, couplée à la mémoire-tampon d'instructions et permettant de sélectionner les instructions pour exécution, et plusieurs unités fonctionnelles accomplissant des opérations fonctionnelles spécifiées par instructions. Un programmateur d'instructions unifiées, dans l'unité de sélection des instructions, amorce le traitement des instructions par les unités fonctionnelles lorsque les instructions sont déterminées disponibles pour exécution et lorsque au moins une des unités fonctionnelles assurant une fonction de calcul indispensable est disponible. La programmation unifiée est assurée par des cheminements multiples de données d'exécution, selon lesquels chaque cheminement de données d'exécution, et avec lui ses unités fonctionnelles correspondantes, est généralement optimisé pour le type de fonction de calcul à accomplir sur les données: intégrales, calcul en virgule flottante et algèbre de Boole. Le nombre, le type et les éléments spécifiques de calcul des unités fonctionnelles dans chaque cheminement de données, ainsi qu'entre ces cheminements, sont indépendants les uns des autres.
摘要:
A computer system which facilitates the execution of nested subroutines and interrupts is disclosed. As each branch transfer within the program is executed by a control area logic, a microcommand initiates the transfer of the return address, which has been derived from the address in the present routine, to a first register of a push down stack. In addition, the microcommand also pushes down one level the contents of all of the registers in the stack containing previously stored return addresses. Thus, a sequential return to unfinished routines or subroutines is provided. When the subroutine or hardware interrupt service routine is completed, a code in the address field enables the return address of the previously branched from or interrupted routine to be retrieved from the first register in the push down stack and to provide it as the address of the next instruction to be executed. The retrieval of the return address from the push down stack also pops all other stored return addresses one level in the stack. In addition to providing multiple levels of subroutine and interrupt nesting, any number of subroutines or hardware interrupts may be partially completed since the last operating subroutine or hardware interrupt service routine is always the first one to be completed. Logic is also provided to detect the occurrence of a hardware interrupt during a return sequence such that the requirement to simultaneously push and pop the stack is properly handled.
摘要:
A high-performance CPU of the RISC (reduced instruction set) type employs a standardized, fixed instruction size, and permits only simplified memory access data width and addressing modes. The instruction set is limited to register-to-register operations and register load/store operations. Byte manipulation instructions, included to permit use of previously-established data structures, include the facility for doing in-register byte extract, insert and masking, along with non-aligned load and store instructions. The provision of load/locked and store/conditional instructions permits the implementation of atomic byte writes. By providing a conditional move instruction, many short branches can be eliminated altogether. A conditional move instruction tests a register and moves a second register to a third if the condition is met; this function can be substituted for short branches and thus maintain the sequentiality of the instruction stream. Performance can be speeded up by predicting the target of a branch and prefetching the new instruction based upon this prediction; a branch prediction rule is followed that requires all forward branches to be predicted not-taken and all backward branches (as is common for loops) to be predicted as taken. Another performance improvement makes use of unused bits in the standard-sized instruction to provide a hint of the expected target address for jump and jump to subroutine instructions or the like. The target can thus be prefetched before the actual address has been calculated and placed in a register. In addition, the unused displacement part of the jump instruction can contain a field to define the actual type of jump, i.e., jump, jump to subroutine, return from subroutine, and thus place a predicted target address in a stack to allow prefetching before the instruction has been executed. The processor can employ a variable memory page size, so that the entries in a translation buffer for implementing virtual addressing can be optimally used. A granularity hint is added to the page table entry to define the page size for this entry. An additional feature is the addition of a prefetch instruction which serves to move a block of data to a faster-access cache in the memory hierarchy before the data block is to be used.
摘要:
A digital data processor acts on a branch and return on address (BAROA) instruction having an operation code field, a memory entry address field and a memory exit address field. The operation code field of the branch and return on address instruction is into an instruction register, the memory exit address field of the loaded branch and return on address instruction is loaded into the address register and the memory entry address field of the branch and return on address instruction is loaded into the program counter. The next sequential address following the address of the current BAROA instruction is then stored in a register stack, and a sequence of instructions starting with the instruction residing at the memory entry address provided by the branch and return on address instruction is fetched and executed. The program counter is incremented each time an instruction is executed. In this manner, the program counter provides the memory addresses of the instructions to be fetched. The memory address in the program counter is compared with the exit address in the address register and loading a return instruction operation code into the instruction register when the memory address in the program counter becomes equal to the exit address in the address register, such return instruction operation code, in turn, causing the address stored in the register stack to be loaded into the program counter.
摘要:
In a data processing apparatus, a system for executing branches in single entry-single exit (SESE) basic blocks (BBs) contained within a program has means receiving the said program for determining a branch instruction within each basic block and for adding firing time information to the branch instruction. The firing time information identifies a time of execution of the branch instruction which is a variable number of instruction cycles prior to a time of execution of a last-to-be-executed instruction of the basic block. The system also has a processor operative on received non-branch instructions in each basic block for processing the instructions, and means operative on the received branch instruction in the basic block in response to the firing time information for completing the execution of said branch instruction no later than the same time as the processor is processing the last-to-be-executed non-branch instruction so that the execution of the branch instruction occurs in parallel with the execution of the non-branch instructions thereby speeding the overall processing of the program by the system.
摘要:
A language specific microprocessor for the computer language known as FORTH is disclosed. The microprocessor includes four main registers each for holding a parameter; a L or instruction latch register for decoding instructions and activating microprocessor operation; an I or return index register for tracking returns; an N or next parameter register for operation with an arithmetic logic unit (ALU); and a T or top of parameter stack register with an appended ALU. A return stack port is connected to the I register and a parameter stack port is connected to the N register circuit, each have last in/first out (LIFO) memory stacks for reads and writes to isolated independent memory islands that are external to the microprocessor. The respective I, T and N registers are connected in respective series by paired bus connections for swapping parameters between adjacent registers. A first split 16 bit multiplexer J/K controls the LIFO stack for the I and N registers on paired 8 bit address stacks; a second 16 bit multiplexer designates the pointer to main memory with 65K addresses and an adjoining 65K for data. This addressing multiplexer receives selective input from a program counter P, the return index register the top of the parameter stack T and/or the instruction latch L. Movement to subroutine is handled in a single cycle with returns being handled at the end of any designated cycle. Asynchronous microprocessor operation is provided with the address multiplexer being simultaneously set up with an address to a future machine step, unloading from memory of appropriate data or instruction for the next machine step and asynchronously executing the current machine step. A two-phase clock latches data as valid on a rising edge and moves to a new memory location on a falling edge. This two phase clock is given a pulse width sufficient for all asynchronous cycles of microprocessor operations to settle. The microprocessor's assembler language is FORTH and the stack and main memory port architecture uniquely complements FORTH to produce a small (17,000 gates) fast (40 mips) microprocessor operable on extant FORTH programs. Provision is made for an additional G port which enables the current operating state of the microprocessor to be mapped, addressing of up to 21 bits as well as the ability to operate the microprocessor in tandem with similar microprocessors.
摘要:
Address translation for instruction fetching can be obviated for sequences of instruction instances that reside on a same page. Obviating address translation reduces power consumption and increases pipeline efficiency since accessing of an address translation buffer can be avoided. Certain events, such as branch mis-predictions and exceptions, can be designated as page boundary crossing events. In addition, carry over at a particular bit position when computing a branch target or a next instruction instance fetch target can also be designated as a page boundary crossing event. An address translation buffer is accessed to translate an address representation of a first instruction instance. However, until a page boundary crossing event occurs, the address representations of subsequent instruction instances are not translated. Instead, the translated portion of the address representation for the first instruction instance is recycled for the subsequent instruction instances.
摘要:
A method is described. The method includes receiving an instruction, accessing a return cache to load a predicted return target address upon determining that the instruction is a return instruction, searching a lookup table for executable binary code upon determining that the predicted translated return target address is incorrect and executing the executable binary code to perform a binary translation.