Method and apparatus for out-of-order processing of packets using linked lists
    1.
    发明授权
    Method and apparatus for out-of-order processing of packets using linked lists 有权
    使用链表对数据包进行无序处理的方法和装置

    公开(公告)号:US07349399B1

    公开(公告)日:2008-03-25

    申请号:US10327555

    申请日:2002-12-20

    CPC classification number: H04L49/9094 H04L49/90

    Abstract: These and other aspects of the present invention will be better described with reference to the Detailed Description and the accompanying figures. A method and apparatus for out-of-order processing of packets using linked lists is described. In one embodiment, the method includes receiving packets in a global order, the packets being designated for different ones of a plurality of reorder contexts. The method also includes storing information regarding each of the packets in a shared reorder buffer. The method also includes for each of the plurality of reorder contexts, maintaining a reorder context linked list that records the order in which those of the packets that were designated for that reorder context and that are currently stored in the shared reorder buffer were received relative to the global order. The method also includes completing processing of at least certain of the packets out of the global order and retiring the packets from the shared reorder buffer out of the global order for at least certain of the packets.

    Abstract translation: 将参照具体实施方式和附图更好地描述本发明的这些和其它方面。 描述了使用链表对包进行无序处理的方法和装置。 在一个实施例中,所述方法包括以全局顺序接收分组,所述分组被指定用于多个重排序上下文中的不同的分组。 该方法还包括将关于每个分组的信息存储在共享重排序缓冲器中。 该方法还包括对于多个重排序上下文中的每一个,维护重排序上下文链接列表,其记录其中针对该重排序上下文指定的分组以及当前存储在共享重排序缓冲器中的分组的顺序相对于 全球秩序。 该方法还包括完成处于全局顺序中的至少某些分组的处理,并且至少在某些分组中从全局顺序退出来自共享重排序缓冲器的分组。

    Microprocessor including multiple register files mapped to the same logical storage and inhibiting sychronization between the register files responsive to inclusion of an instruction in an instruction sequence
    2.
    发明授权
    Microprocessor including multiple register files mapped to the same logical storage and inhibiting sychronization between the register files responsive to inclusion of an instruction in an instruction sequence 失效
    微处理器包括映射到同一逻辑存储器的多个寄存器文件,并且响应于在指令序列中包含指令而禁止寄存器文件之间的同步

    公开(公告)号:US06237083B1

    公开(公告)日:2001-05-22

    申请号:US09110518

    申请日:1998-07-06

    Applicant: John G. Favor

    Inventor: John G. Favor

    Abstract: A microprocessor includes a first register file including a plurality of multimedia registers defined to store operands for multimedia instructions and a second register file including a plurality of floating point registers defined to store operands for floating point instructions. The multimedia registers and floating point registers are mapped to the same logical storage according to the instruction set employed by the microprocessor. In order to maintain predefined behavior when a floating point instruction reads a register most recently updated by a multimedia instruction or vice versa, the microprocessor provides for synchronization of the first and second register files between executing a set of one or more multimedia instructions and a set of one or more floating point instructions (where either set may be prior to the other in program order and the order affects which direction copying of the contents is performed, i.e. first register file to second register file or vice versa). The predefined behavior in the above mentioned circumstances is thereby maintained. The microprocessor supports an empty state instruction. If the empty state instruction is included between the set of one or more multimedia instructions and the set of one or more floating point instructions in a code sequence, the microprocessor inhibits the register file synchronization. In one embodiment including the x86 instruction set, the empty state instruction performs the same set of actions as the EMMS instruction in addition to the above mentioned features.

    Abstract translation: 微处理器包括第一寄存器文件,其包括被定义为存储用于多媒体指令的操作数的多个多媒体寄存器,以及包括多个浮点寄存器的第二寄存器堆,所述多个浮点寄存器被定义为存储用于浮点指令的操作数。 多媒体寄存器和浮点寄存器根据微处理器采用的指令集映射到相同的逻辑存储器。 为了在浮点指令读取由多媒体指令最近更新的寄存器或反之亦然时保持预定义的行为,微处理器在执行一组一个或多个多媒体指令和一组之间提供第一和第二寄存器文件的同步 一个或多个浮点指令(其中任一集合可以在程序顺序中彼此之前),并且该顺序影响执行内容的复制方式,即首先将文件注册到第二寄存器文件,反之亦然)。 从而保持上述情况下的预定义的行为。 微处理器支持空状态指令。 如果在一个或多个多媒体指令的集合和代码序列中的一个或多个浮点指令的集合之间包括空状态指令,则微处理器禁止寄存器文件同步。 在包括x86指令集的一个实施例中,空状态指令除了上述特征之外还执行与EMMS指令相同的一组动作。

    Computer having multimedia operations executable as two distinct sets of
operations within a single instruction cycle
    3.
    发明授权
    Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle 失效
    具有多媒体操作的计算机在单个指令周期内可执行为两组不同的操作

    公开(公告)号:US6061521A

    公开(公告)日:2000-05-09

    申请号:US759042

    申请日:1996-12-02

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU may be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers may be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands. In one embodiment, an arithmetic logic unit may be partitioned into at least two logic portions. A first logic portion may be coupled to receive a first operand from a fixed slot of a first register and a second operand from any slot of a second register. A second logic portion may be coupled to receive a third operand from a fixed slot of the first register and a fourth operand from any slot of the second register. The first logic portion may perform an arithmetic operation dissimilar from the second logic portion.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地CPU总线耦合到常规处理器。 MEU使用向量寄存器,向量ALU和操作数路由单元(ORU)来尽可能少地执行多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地路由和组合在矢量ALU。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。 在一个实施例中,算术逻辑单元可以被划分为至少两个逻辑部分。 第一逻辑部分可以被耦合以从第二寄存器的任何时隙的第一寄存器的固定时隙和第二操作数接收第一操作数。 第二逻辑部分可以被耦合以从第一寄存器的固定时隙接收第三操作数,并从第二寄存器的任何时隙接收第四操作数。 第一逻辑部分可以执行与第二逻辑部分不同的算术运算。

    Apparatus for routing one operand to an arithmetic logic unit from a
fixed register slot and another operand from any register slot
    4.
    发明授权
    Apparatus for routing one operand to an arithmetic logic unit from a fixed register slot and another operand from any register slot 有权
    用于从固定寄存器时隙将一个操作数路由到算术逻辑单元的装置,以及来自任何寄存器时隙的另一个操作数的装置

    公开(公告)号:US06047372A

    公开(公告)日:2000-04-04

    申请号:US290837

    申请日:1999-04-13

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands. In one embodiment multiple ALUs may each receive one operand from a fixed source register slot location, where the fixed slot location may be different for each ALU. The operand routing may provide another operand from any source register slot location for another input to each respective ALU.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地CPU总线耦合到常规处理器。 MEU使用向量寄存器,向量ALU和操作数路由单元(ORU)来尽可能少地执行多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地在矢量ALU中路由和组合。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。 在一个实施例中,多个ALU可以从固定的源寄存器时隙位置接收一个操作数,其中固定的时隙位置对于每个ALU可以是不同的。 操作数路由可以从任何源寄存器时隙位置提供另一个操作数,用于另一个输入到每个相应的ALU。

    Instruction decoder including two-way emulation code branching
    5.
    发明授权
    Instruction decoder including two-way emulation code branching 失效
    指令解码器包括双向仿真码分支

    公开(公告)号:US5920713A

    公开(公告)日:1999-07-06

    申请号:US649984

    申请日:1996-05-16

    Applicant: John G. Favor

    Inventor: John G. Favor

    Abstract: An instruction decoder includes an emulation code sequencer and emulation code ROM for handling various instructions. The emulation code ROM includes a sequence of operations (Op) and an operation sequencing control code (OpSeq). Branch instructions such as conditional branch instructions may be encoded into the emulation code ROM so that a second branch, in combination with the branching operation controlled by the OpSeq code, is applied to an operation code sequence. Two-way branching permits flexible branching to locations within the emulation code ROM so that memory capacity is conserved. A superscalar microprocessor includes an instruction decoder having an emulation code control circuit and an emulation ROM which emulates the function of a logic instruction decoder. The emulation code ROM is arranged as a matrix of multiple-operation (Op) units with each multiple-Op unit including a control field that points to a next location in the emulation code ROM. In one embodiment, the emulation code ROM is arranged to include a plurality of four-Op units, called Op quads, with each Op quad including a sequencing control field, called an OpSeq field.

    Abstract translation: 指令解码器包括用于处理各种指令的仿真码排序器和仿真代码ROM。 仿真代码ROM包括操作序列(Op)和操作顺序控制代码(OpSeq)。 诸如条件分支指令之类的分支指令可以被编码到仿真代码ROM中,使得与由OpSeq代码控制的分支操作相结合的第二分支被应用于操作代码序列。 双向分支允许灵活分支到仿真代码ROM内的位置,从而节省存储器容量。 超标量微处理器包括具有仿真代码控制电路的指令译码器和模拟逻辑指令解码器的功能的仿真ROM。 仿真码ROM被布置为多操作(Op)单元的矩阵,每个多操作单元包括指向仿真代码ROM中的下一个位置的控制字段。 在一个实施例中,仿真代码ROM被布置成包括称为Op四边形的多个四运算单元,每个Op quad包括称为OpSeq字段的排序控制字段。

    Computer modified to perform inverse discrete cosine transform
operations on a one-dimensional matrix of numbers within a minimal
number of instruction cycles
    6.
    发明授权
    Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles 失效
    计算机被修改为对最小数量的指令周期内的数字的一维矩阵执行逆离散余弦变换操作

    公开(公告)号:US5801975A

    公开(公告)日:1998-09-01

    申请号:US759045

    申请日:1996-12-02

    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.

    Abstract translation: 提供多媒体扩展单元(MEU)用于执行各种多媒体类型操作。 MEU可以通过协处理器总线或本地中央处理单元(CPU)总线耦合到常规处理器。 MEU采用向量寄存器,向量算术逻辑单元(ALU)和操作数路由单元(ORU),以尽可能少的指令周期执行最大数量的多媒体操作。 通过根据期望的算法流程图将操作数布置在向量ALU上来容易地执行复杂算法。 ORU使用MAU特有的向量指令对齐向量寄存器的分区插槽或子时隙内的操作数。 在ORU的输出端,矢量源或目标寄存器的操作数对可以很容易地在矢量ALU中路由和组合。 向量指令采用特殊的加载/存储指令与许多操作指令相结合,对对齐的操作数执行并发的多媒体操作。

    Method and apparatus for store-into-instruction-stream detection and
maintaining branch prediction cache consistency
    7.
    发明授权
    Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency 失效
    用于存储到指令流检测和维持分支预测高速缓存一致性的方法和装置

    公开(公告)号:US5649137A

    公开(公告)日:1997-07-15

    申请号:US582294

    申请日:1996-01-03

    CPC classification number: G06F9/3812 G06F9/3844

    Abstract: The present invention provides for the updating of both the instructions in a branch prediction cache and instructions recently provided to an instruction pipeline from the cache when an instruction being executed attempts to change such instructions ("Store-Into-Instruction-Stream"). The branch prediction cache (BPC) includes a tag identifying the address of instructions causing a branch, a record of the target address which was branched to on the last occurrence of each branch instruction, and a copy of the first several instructions beginning at this target address. A separate instruction cache is provided for normal execution of instructions, and all of the instructions written into the branch prediction cache from the system bus must also be stored in the instruction cache. The instruction cache monitors the system bus for attempts to write to the address of an instruction contained in the instruction cache. Upon such a detection, that entry in the instruction cache is invalidated, and the corresponding entry in the branch prediction cache is invalidated. A subsequent attempt to use an instruction in the branch prediction cache which has been invalidated will detect that it is not valid, and will instead go to main memory to fetch the instruction, where it has been modified.

    Abstract translation: 本发明提供了当执行的指令尝试改变这样的指令(“存储到指令流”)时更新分支预测高速缓存中的两个指令和最近提供给来自高速缓存的指令流水线的指令。 分支预测高速缓存(BPC)包括识别导致分支的指令的地址的标签,在每个分支指令的最后出现时被分支的目标地址的记录以及从该目标开始的前几个指令的副本 地址。 提供单独的指令高速缓存用于指令的正常执行,并且从系统总线写入分支预测高速缓存的所有指令也必须存储在指令高速缓存中。 指令高速缓存监视系统总线以尝试写入指令高速缓存中包含的指令的地址。 在这种检测中,指令高速缓存中的该条目无效,并且分支预测高速缓存中的相应条目无效。 随后尝试使用已经无效的分支预测高速缓存中的指令将检测到它无效,并且将转到主存储器以获取已经被修改的指令。

    Two-level branch prediction cache
    8.
    发明授权
    Two-level branch prediction cache 失效
    两级分支预测缓存

    公开(公告)号:US5327547A

    公开(公告)日:1994-07-05

    申请号:US954441

    申请日:1992-09-30

    CPC classification number: G06F9/3806 G06F9/3804 G06F9/3848

    Abstract: An improved branch prediction cache (BPC) scheme that utilizes a hybrid cache structure. The BPC provides two levels of branch information caching. The fully associative first level BPC is a shallow but wide structure (36 32-byte entries), which caches full prediction information for a limited number of branch instructions. The second direct mapped level BPC is a deep but narrow structure (256 2-byte entries), which caches only partial prediction information, but does so for a much larger number of branch instructions. As each branch instruction is fetched and decoded, its address is used to perform parallel look-ups in the two branch prediction caches.

    Abstract translation: 利用混合缓存结构的改进的分支预测高速缓存(BPC)方案。 BPC提供两级分支信息缓存。 完全关联的第一级BPC是浅而宽的结构(36个32字节的条目),其缓存有限数量的分支指令的全部预测信息。 第二个直接映射级别BPC是一个深而窄的结构(256个2字节条目),它仅缓存部分预测信息,但对于大量的分支指令则是这样做的。 当每个分支指令被取出和解码时,其地址用于在两个分支预测高速缓存中执行并行查找。

    Register-based redundancy circuit and method for built-in self-repair in
a semiconductor memory device
    10.
    发明授权
    Register-based redundancy circuit and method for built-in self-repair in a semiconductor memory device 失效
    基于寄存器的冗余电路和在半导体存储器件中内置自修复的方法

    公开(公告)号:US5920515A

    公开(公告)日:1999-07-06

    申请号:US938062

    申请日:1997-09-26

    CPC classification number: G11C29/84 G11C29/844

    Abstract: A semiconductor memory array with Built-in Self-Repair (BISR) includes redundancy circuits associated with failed row address stores to drive redundant row word lines, thereby obviating the supply and normal decoding of a substitute addresses. NOT comparator logic compares a failed row address generated and stored by BISR circuits to a row address supplied to the memory array. A TRUE comparator configured in parallel with the NOT comparator simultaneously compares defective row address signal to the supplied row address. Since NOT comparison is performed quickly in dynamic logic without setup and hold time constraints, timing impact on a normal (non-redundant) row decode path is negligible, and since TRUE comparison, though potentially slower than NOT comparison, itself identifies a redundant row address and therefore need not employ an N-bit address to selected word-line decode, redundant row addressing is rapid and does not adversely degrade performance of a self-repaired semiconductor memory array. By providing redundancy handling at the predecode circuit level, rather than at a preliminary address substitution stage, access times to a BISR memory array in accordance with the present invention are improved.

    Abstract translation: 具有内置自修复(BISR)的半导体存储器阵列包括与故障行地址存储相关联的冗余电路以驱动冗余行字线,从而避免替代地址的供应和正常解码。 NOT比较器逻辑将由BISR电路生成和存储的故障行地址与提供给存储器阵列的行地址进行比较。 与NOT比较并行配置的TRUE比较器同时将缺陷行地址信号与提供的行地址进行比较。 由于在没有设置和保持时间约束的情况下,在动态逻辑中不快速执行比较,所以对正常(非冗余)行解码路径的定时影响是可以忽略的,并且由于真正的比较虽然潜在地比NOT比较慢,但是它自身识别冗余行地址 因此不需要对所选字线解码采用N位地址,冗余行寻址是快速的并且不会不利地降低自修复的半导体存储器阵列的性能。 通过在预解码电路级提供冗余处理,而不是在初始地址替换阶段,改进了根据本发明的BISR存储器阵列的访问时间。

Patent Agency Ranking