Method and system for managing registers in a data processing system supports out-of-order and speculative instruction execution
    1.
    发明授权
    Method and system for managing registers in a data processing system supports out-of-order and speculative instruction execution 失效
    用于管理数据处理系统中的寄存器的方法和系统支持无序和推测性指令执行

    公开(公告)号:US06356918B1

    公开(公告)日:2002-03-12

    申请号:US08507542

    申请日:1995-07-26

    Abstract: A method and a system in a data processing system for managing registers in a register array wherein the data processing system has M architected registers and the register array has greater than M registers. A first physical register address is selected from a group of available physical register addresses in a renamed table in response to dispatching a register-modifying instruction that specifies an architected target register address. The architected target register address is then associated with the first physical register address, and a result of executing the register-modifying instruction is stored in a physical register pointed to by the first physical register address. In response to completing the register-modifying instruction, the first physical address in the rename table is exchanged with a second physical address in a completion renamed table, wherein the second physical address is located in the completion rename table at a location pointed to by the architected target register address. Therefore, upon instruction completion, the completion rename table contains pointers that map architected register addresses to physical register addresses. The rename table maps architected register addresses to physical register addresses for instructions currently being executed, or for instructions that have “finished” and have not yet been “completed.” Bits indicating the validity of an association between an architected register address and a physical register address are stored before instructions are speculatively executed following an unresolved conditional branch.

    Abstract translation: 一种用于管理寄存器阵列中的寄存器的数据处理系统中的方法和系统,其中所述数据处理系统具有M个架构的寄存器,并且所述寄存器阵列具有大于M个寄存器。 响应于调度指定架构化目标寄存器地址的寄存器修改指令,从重命名的表中的一组可用物理寄存器地址中选择第一物理寄存器地址。 然后将架构化的目标寄存器地址与第一物理寄存器地址相关联,并且执行寄存器修改指令的结果存储在由第一物理寄存器地址指向的物理寄存器中。 响应于完成寄存器修改指令,重命名表中的第一物理地址与完成重命名表中的第二物理地址交换,其中第二物理地址位于完成重命名表中,位于由 架构目标寄存器地址。 因此,在指令完成时,完成重命名表包含将架构化的寄存器地址映射到物理寄存器地址的指针。 重命名表将架构化的寄存器地址映射到当前正在执行的指令的物理寄存器地址,或者对于“已完成”但尚未“完成”的指令。 指示建立的寄存器地址和物理寄存器地址之间的关联的有效位的位置是在未解决的条件分支之后推测执行指令之前存储的。

    Method and apparatus for reconstructing the address of the next instruction to be completed in a pipelined processor
    2.
    发明授权
    Method and apparatus for reconstructing the address of the next instruction to be completed in a pipelined processor 失效
    用于重构在流水线处理器中要完成的下一条指令的地址的方法和装置

    公开(公告)号:US06185674B2

    公开(公告)日:2001-02-06

    申请号:US08417421

    申请日:1995-04-05

    CPC classification number: G06F9/322

    Abstract: A computer processing unit is provided that includes an apparatus for generating an address of the next instruction to be completed. The apparatus includes a first table for storing a plurality of entries each corresponding to a dispatched instruction, each entry comprising an identifier that identifies the corresponding instruction and a status bit that indicates if the corresponding instruction is completed; a second table for storing a plurality of entries each corresponding to dispatched branch instructions, each entry comprising the same identifier stored in the first table, a target address of the dispatched branch instruction and a resolution status field that indicates at least if the corresponding branch instruction has been resolved taken or has been resolved not taken; program counter update logic that, in each machine cycle, updates a program counter to store and output the address of the next instruction to be completed according to the entries stored in the first table and the second table. Because the first and second tables employ efficient identification tags to identify instructions that modify the control flow of the execution pipeline and the target address of such instructions, the computer processing unit of the present invention need not store the full address of each instruction in the execution pipeline to update the program counter as is conventional, and thus saves real estate that may be used for other circuitry.

    Abstract translation: 提供一种计算机处理单元,其包括用于生成要完成的下一条指令的地址的装置。 该装置包括:第一表,用于存储与分派指令对应的多个条目;每个条目包括识别对应指令的标识符和指示相应指令是否完成的状态位; 用于存储多个条目的第二表,每个条目各自对应于分派的分支指令,每个条目包括存储在第一表中的相同标识符,分派的分支指令的目标地址和分辨率状态字段,其至少指示相应的分支指令 已解决或已解决不采取; 程序计数器更新逻辑,其在每个机器周期中根据存储在第一表和第二表中的条目更新程序计数器以存储和输出要完成的下一个指令的地址。因为第一和第二表使用有效 用于识别修改执行流水线的控制流程和这些指令的目标地址的指令的识别标签,本发明的计算机处理单元不需要在执行流水线中存储每条指令的完整地址以更新程序计数器 并且因此节省可用于其它电路的不动产。

    Fast multiple operands adder/subtracter based on shifting
    3.
    发明授权
    Fast multiple operands adder/subtracter based on shifting 失效
    基于移位的快速多操作数加法器/减法器

    公开(公告)号:US5777918A

    公开(公告)日:1998-07-07

    申请号:US600691

    申请日:1996-02-13

    CPC classification number: G06F7/509 G06F7/5057 G06F7/4991

    Abstract: A fast adder/subtracter using a decoder and shifting function instead of conventional full-adders is disclosed. The circuit is optimized for the addition of multiple operands up to 4-5 binary bits in magnitude. Using this method a subtraction operation can be performed at no added cost with respect to addition (compared to the conventional method requiring complementing one of the operands). Addition and subtraction of multiple operands is implemented by simple multiple shift operations. The multiple shift operations can be implemented as a chain of series NMOS pulldown devices with a precharged load providing considerable speed advantage over conventional solutions. Fast overflow detection may be implemented by or-ing the higher order bits in the shifter.

    Abstract translation: 公开了一种使用解码器和移位功能而不是传统的全加器的快速加法器/减法器。 该电路经过优化,可增加多达4-5个二进制位的操作数。 使用该方法,相对于添加,可以不添加成本来执行减法运算(与需要补充操作数之一的常规方法相比)。 通过简单的多重移位操作来实现多个操作数的加减。 多个移位操作可以被实现为具有预充电负载的串联NMOS下拉器件链,其提供相对于常规解决方案相当大的速度优势。 快速溢出检测可以通过移位器中的较高位实现。

    Apparatus for concurrent multiple instruction decode in variable length
instruction set computer
    4.
    发明授权
    Apparatus for concurrent multiple instruction decode in variable length instruction set computer 失效
    用于在可变长度指令集计算机中并发多指令解码的装置

    公开(公告)号:US5371864A

    公开(公告)日:1994-12-06

    申请号:US866766

    申请日:1992-04-09

    Inventor: Chiao-Mei Chuang

    CPC classification number: G06F9/3822 G06F9/30149 G06F9/3814

    Abstract: A data processing apparatus for simultaneously reading out groups of two or more contiguous, variable length instructions from memory, and for decoding the group of variable length instructions in parallel. The data processing apparatus has a memory containing at least first, second, and third contiguous instructions, and at least first, second, and third read ports for receiving starting addresses and for reading out the instructions from the memory. A next instruction pointer supplies the starting address of the first instruction to the first read port, receives the first instruction, decodes the length of the first instruction, determines the starting address of the second instruction, supplies the starting address of the second instruction to the first read port, receives the second instruction, decodes the length of the second instruction, and determines the starting address of the third instruction. All of these operations are performed in one cycle time. An instruction pointer queue receives and stores the starting addresses of at least the second and third instructions, and supplies the starting addresses to the second and third read ports for simultaneously reading out the second and third instructions from the memory. First and second instruction decoders receive and simultaneously decode the second and third instructions.

    Abstract translation: 一种数据处理装置,用于从存储器同时读出两个或更多个连续可变长度指令的组,并且用于并行地解码该组可变长度指令。 数据处理装置具有包含至少第一,第二和第三连续指令的存储器,以及用于接收起始地址和从存储器读出指令的至少第一,第二和第三读取端口。 下一指令指针将第一指令的起始地址提供给第一读端口,接收第一指令,解码第一指令的长度,确定第二指令的起始地址,将第二指令的起始地址提供给 第一读取端口,接收第二指令,解码第二指令的长度,并确定第三指令的起始地址。 所有这些操作都在一个周期内执行。 指令指针队列接收并存储至少第二和第三指令的起始地址,并将起始地址提供给第二和第三读取端口,用于从存储器同时读出第二和第三指令。 第一和第二指令解码器接收并同时解码第二和第三指令。

    Backing Register File for processors
    5.
    发明授权
    Backing Register File for processors 有权
    为处理器备份注册文件

    公开(公告)号:US07206925B1

    公开(公告)日:2007-04-17

    申请号:US09643895

    申请日:2000-08-18

    Abstract: A processor is defined by a new architectural feature called a Backing Register File, where a Backing Register File is a set of randomly accessible registers capable of holding values, and further are directly connected to the processor's register files. The processor's register files are in turn connected to the processor's execution units. A Backing Register File is visible and controllable by users, allowing them to make use of a larger local address space increasing execution unit throughput thereby, while not changing the size of the processor's register files themselves.

    Abstract translation: 处理器由称为备份寄存器文件的新架构特征定义,其中备份寄存器文件是能够保存值的一组可随机访问的寄存器,并且还直接连接到处理器的寄存器文件。 处理器的寄存器文件又连接到处理器的执行单元。 备份寄存器文件是可见和可控的用户,允许他们利用更大的本地地址空间增加执行单位吞吐量,而不改变处理器的寄存器文件本身的大小。

    Executing speculative parallel instructions threads with forking and
inter-thread communication
    6.
    发明授权
    Executing speculative parallel instructions threads with forking and inter-thread communication 失效
    执行带有分叉和线程间通信的推测性并行指令线程

    公开(公告)号:US5812811A

    公开(公告)日:1998-09-22

    申请号:US383331

    申请日:1995-02-03

    CPC classification number: G06F9/3009 G06F9/3842 G06F9/3851

    Abstract: A central processing unit (CPU) in a computer that permits speculative parallel execution of more than one instruction thread. The CPU uses Fork-Suspend instructions that are added to the instruction set of the CPU, and are inserted in a program prior to run-time to delineate potential future threads for parallel execution. The CPU has an instruction cache with one or more instruction cache ports, a bank of one or more program counters, a bank of one or more dispatchers, a thread management unit that handles inter-thread communications and discards future threads that violate dependencies, a set of architectural registers common to all threads, and a scheduler that schedules parallel execution of the instructions on one or more functional units in the CPU.

    Abstract translation: 计算机中的中央处理单元(CPU),允许多个指令线程的推测并行执行。 CPU使用被添加到CPU的指令集中的Fork-Suspend指令,并在运行时插入到程序中,以描绘未来可能的并行线程。 CPU具有指令高速缓存,其具有一个或多个指令高速缓存端口,一个或多个程序计数器的存储体,一个或多个调度器的存储体,处理线程间通信的线程管理单元,并丢弃违反相关性的未来线程, 所有线程通用的一组架构寄存器,以及一个在CPU中的一个或多个功能单元上并行执行指令的调度程序。

    Explicitly clustered register file and execution unit architecture
    7.
    发明授权
    Explicitly clustered register file and execution unit architecture 有权
    显式集群寄存器文件和执行单元体系结构

    公开(公告)号:US06757807B1

    公开(公告)日:2004-06-29

    申请号:US09642075

    申请日:2000-08-18

    Abstract: A processor comprising a new architectural feature called a Register Domain, where a Register Domain has a register file, at least one execution unit, and coupling circuitry between the two. A processor will typically have a plurality of Register Domains, and Register Domains may have different types of execution units within them. Individual Register Domains will be visible to a user.

    Abstract translation: 一种处理器,包括称为寄存器域的新架构特征,其中寄存器域具有寄存器文件,至少一个执行单元和两者之间的耦合电路。 处理器通常具有多个注册域,注册域可以具有不同类型的执行单元。 个别注册域将对用户可见。

    General purpose memory access scheme using register-indirect mode
    8.
    发明授权
    General purpose memory access scheme using register-indirect mode 失效
    通用存储器访问方案采用寄存器间接模式

    公开(公告)号:US5367648A

    公开(公告)日:1994-11-22

    申请号:US659717

    申请日:1991-02-20

    CPC classification number: G06F9/30043 G06F13/1615 G06F9/3824

    Abstract: A memory access scheme achieved using a memory address register and a register-indirect memory accessing mode eliminates write back collisions, long cycle time, and enhances system performance. During memory address generation operations, an arithmetic-logic unit (ALU) generates memory addresses from data in a general purpose register (GPR). Then, the memory addresses are written back to the GPR and a memory address register (MAR). During memory access operations, the MAR is accessed for the memory addresses to access a memory. Two approaches are provided. In a first approach, use of the MAR during the memory access operations is explicit. In a second approach, use of the MAR during the memory access operations is transparent. According to the second approach, a controller is provided to validate the MAR during the memory access operations.

    Abstract translation: 使用存储器地址寄存器和寄存器 - 间接存储器访问模式实现的存储器访问方案消除了回写冲突,长周期时间并且增强了系统性能。 在存储器地址生成操作期间,算术逻辑单元(ALU)从通用寄存器(GPR)中的数据生成存储器地址。 然后,存储器地址被写回到GPR和存储器地址寄存器(MAR)。 在存储器访问操作期间,访问存储器地址的MAR访问存储器。 提供了两种方法。 在第一种方法中,在存储器访问操作期间使用MAR是明确的。 在第二种方法中,在存储器访问操作期间使用MAR是透明的。 根据第二种方法,提供控制器以在存储器访问操作期间验证MAR。

    Functional cache memory chip architecture for improved cache access
    9.
    发明授权
    Functional cache memory chip architecture for improved cache access 失效
    功能缓存存储器芯片架构,用于改进缓存访问

    公开(公告)号:US4905188A

    公开(公告)日:1990-02-27

    申请号:US158964

    申请日:1988-02-22

    CPC classification number: G06F12/0804 G06F12/0859 G06F12/0864

    Abstract: An on-chip VLSI cache architecture including a single-port, last-select, cache array organized as an n-way set-associative cache (having n congruence classes) including a plurality of functionally integrated units on-chip in addition to the cache array and including a normal read/write CPU access function which provides an architectural organization for allowing the chip to be used in (1) a fast, "late-select" operation which may be provided with any desired degree of set-associativity while achieving an effective one-cycle write operation, and (2) a cache reload function which provides a highly parallel store-back and reload operation to substantially reduce the reload time, particularly for a store-in cache organization. The cache chip organization and architecture provide a late-select cache having a nearly transparent, multiple word reload by incorporating a Cache-Reload Buffer, a store-back buffer and a load-through function all included on the cache array chip for reloading, and a delayed write-enable for achieving an effective one-cycle write operation. Two separate decoder functions are integrated on the chip, one for cache access for normal read/write operations to and from the CPU and one for cache reload which also provides interim access to data which has been transferred out of main memory to the chip but not yet reloaded into the cache array. These two decoders provide for different accessing modes as required of the CPU or main memory operations.

    Performance enhancement scheme for a RISC type VLSI processor using dual
execution units for parallel instruction processing
    10.
    发明授权
    Performance enhancement scheme for a RISC type VLSI processor using dual execution units for parallel instruction processing 失效
    使用用于并行指令处理的双执行单元的RISC型VLSI处理器的性能增强方案

    公开(公告)号:US4766566A

    公开(公告)日:1988-08-23

    申请号:US896156

    申请日:1986-08-18

    Inventor: Chiao-Mei Chuang

    CPC classification number: G06F9/3885

    Abstract: Performance of a VLSI processor of the reduced instruction set computer (RISC) type is enhanced by executing two instructions simultaneously in the two execution units of the processor. There is very little increase in the cost of hardware. Three embodiments are presented with different cost and performance capabilities. The first embodiment has an instruction input to an instruction buffer (10) and two sets of control ROSs (40 and 42) and control registers (64 and 65). The control ROS and control register which is chosen depends on which instruction execution unit is to execute the instruction. Data inputs to the execution units is from a register file (48) which has an additional pair of outputs (51) and (53) that provide the data paths for simultaneous execution of instructions by the execution units. Execution unit I has an arithmetic and logic unit (ALU) (24), while execution unit II has a rotate (26) and mask generator (31). Load balancing between the two execution units can be performed by adding a multiplier (60) and divider (62) to execution unit II. In the second embodiment, additionally, load balancing is achieved by incorporating an adder (78) into execution unit II. The adder (78) is used to perform address calculations to speed up the load, store and branch instructions. In the third embodiment, an additional ALU (90) is added to execution unit II to allow the instruction processing to be further balanced between the two execution units.

    Abstract translation: 通过在处理器的两个执行单元中同时执行两个指令来增强精简指令集计算机(RISC)类型的VLSI处理器的性能。 硬件成本几乎没有增加。 呈现了具有不同成本和性能能力的三个实施例。 第一实施例具有指令缓冲器(10)和两组控制ROS(40和42)和控制寄存器(64和65)的指令输入。 选择的控制ROS和控制寄存器取决于执行指令的指令执行单元。 来自执行单元的数据输入来自具有另外的一对输出(51)和(53)的寄存器文件(48),其提供用于由执行单元同时执行指令的数据路径。 执行单元I具有算术和逻辑单元(ALU)(24),而执行单元II具有旋转(26)和掩码发生器(31)。 可以通过向执行单元II添加乘法器(60)和分配器(62)来执行两个执行单元之间的负载平衡。 另外,在第二实施例中,通过将加法器(78)并入执行单元II来实现负载平衡。 加法器(78)用于执行地址计算以加速负载,存储和分支指令。 在第三实施例中,附加的ALU(90)被添加到执行单元II,以允许指令处理在两个执行单元之间进一步平衡。

Patent Agency Ranking