Computer system including a microprocessor having a reorder buffer
employing last in buffer and last in line indications
    31.
    发明授权
    Computer system including a microprocessor having a reorder buffer employing last in buffer and last in line indications 失效
    计算机系统包括具有在缓冲器中最后使用的重排序缓冲器的微处理器和最后一行的指示

    公开(公告)号:US6032251A

    公开(公告)日:2000-02-29

    申请号:US78213

    申请日:1998-05-13

    Abstract: A computer system including a microprocessor employing a reorder buffer is provided which stores a last in buffer (LIB) indication corresponding to each instruction. The last in buffer indication indicates whether or not the corresponding instruction is last, in program order, of the instructions within the buffer to update the storage location defined as the destination of that instruction. The LIB indication is included in the dependency checking comparisons. A dependency is indicated for a given source operand and a destination operand within the reorder buffer if the operand specifiers match and the corresponding LIB indication indicates that the instruction corresponding to the destination operand is last to update the corresponding storage location. At most one of the dependency comparisons for a given source operand can indicate dependency. According to one embodiment, the reorder buffer employs a line-oriented configuration. Concurrently decoded instructions are stored into a line of storage, and the concurrently decoded instructions are retired as a unit. A last in line (LIL) indication is stored for each instruction in the line. The LIL indication indicates whether or not the instruction is last within the line storing that instruction to update the storage location defined as the destination of that instruction. The LIL indications for a line can be used as write enables for the register file.

    Abstract translation: 提供一种包括使用重排序缓冲器的微处理器的计算机系统,其存储对应于每个指令的最后一个缓冲器(LIB)指示。 缓冲器指示中的最后一个指示是否以缓冲器中的指令的程序顺序最后的相应指令是否更新被定义为该指令的目的地的存储位置。 LIB指示包含在依赖关系检查比较中。 如果操作数指定符匹配,并且对应的LIB指示指示对应于目的地操作数的指令最后更新相应的存储位置,则对重定序缓冲器内的给定源操作数和目的地操作数指示依赖关系。 对于给定的源操作数,最多的一个依赖比较可以表示依赖。 根据一个实施例,重排序缓冲器采用线路定向配置。 同时解码的指令被存储到一行存储器中,同时解码的指令作为一个单元退休。 对于行中的每条指令,存储最后一行(LIL)指示。 LIL指示指示在存储该指令的行的最后一条指令是否更新被定义为该指令的目的地的存储位置。 一行的LIL指示可用作寄存器文件的写使能。

    Way prediction logic for cache array
    32.
    发明授权
    Way prediction logic for cache array 失效
    缓存阵列的方式预测逻辑

    公开(公告)号:US6016533A

    公开(公告)日:2000-01-18

    申请号:US991846

    申请日:1997-12-16

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A set-associative cache memory configured to use multiple portions of a requested address in parallel to quickly access data from a data array based upon stored way predictions. The cache memory comprises a plurality of memory locations, a plurality of storage locations configured to store way predictions, a decoder, a plurality of pass transistors, and a sense amp unit. A subset of the storage locations are selected according to a first portion of a requested address. The decoder is configured to receive and decode a second portion of the requested address. The decoded portion of the address is used to select a particular subset of the data array based upon the way predictions stored within the selected subset of storage locations. The pass transistors are configured select a second subset of the data array according to a third portion of the requested address. The sense amp unit then reads a cache line from the intersection of the first subset and second subset within the data array.

    Abstract translation: 一种组合高速缓存存储器,其被配置为使用所请求地址的多个部分并行地基于存储的方式预测从数据阵列快速访问数据。 高速缓存存储器包括多个存储器位置,多个存储位置被配置为存储路径预测,解码器,多个传输晶体管和读出放大器单元。 根据请求地址的第一部分来选择存储位置的子集。 解码器被配置为接收和解码所请求地址的第二部分。 地址的解码部分用于基于存储在所选择的存储位置子集内的预测方式来选择数据阵列的特定子集。 配置传输晶体管根据所请求地址的第三部分来选择数据阵列的第二子集。 然后,感测放大器单元从数据阵列内的第一子集和第二子集的相交处读取高速缓存行。

    Number of pipeline stages and loop length related counter differential
based end-loop prediction
    33.
    发明授权
    Number of pipeline stages and loop length related counter differential based end-loop prediction 失效
    基于流水线级数和循环长度相关的基于计数器差分的端环预测

    公开(公告)号:US06003128A

    公开(公告)日:1999-12-14

    申请号:US846656

    申请日:1997-05-01

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F9/325 G06F9/3844

    Abstract: An apparatus for prediction of loop instructions. Loop instructions decrement the value in a counter register and branch to a target address (specified by an instruction operand) if the decremented value of the counter register is greater than zero. The apparatus comprises a loop detection unit that detects the presence of a loop instruction in the instruction stream. An indication of the loop instruction is conveyed to a reorder buffer which stores speculative register values. If the apparatus is not currently processing the loop instruction, a compare value corresponding to the counter register prior to execution of the loop instruction is conveyed to a loop prediction unit. The loop prediction unit also increments a counter value upon receiving each indication of the loop instruction. This counter value is then compared to the compare value conveyed from the reorder buffer. If the counter value is one less than the compare value, a signal is asserted that indicates that the loop instruction should be predicted not-taken upon a next iteration of the loop. In this manner, loop prediction accuracy may be increased by correctly predicting the loop instruction not-taken. Because loops are commonly found in a variety of applications, increasing the accuracy of loop prediction, even slightly, may have a beneficial effect on performance. The loop operation is particularly important in scientific applications where it may be used to perform various digital signal processing routines and to traverse arrays.

    Abstract translation: 一种用于预测循环指令的装置。 如果计数器寄存器的递减值大于零,则循环指令会递减计数器寄存器中的值并转移到目标地址(由指令操作数指定)。 该装置包括检测指令流中循环指令的存在的循环检测单元。 循环指令的指示被传送到存储推测寄存器值的重排序缓冲器。 如果设备当前没有处理循环指令,则在循环指令执行之前对应于计数器寄存器的比较值被传送到循环预测单元。 环路预测单元在接收到循环指令的每个指示时也增加计数器值。 然后将该计数器值与从重排序缓冲器传送的比较值进行比较。 如果计数器值比比较值小一个,则会产生一个信号,指示循环指令在循环的下一次迭代时不应被预测。 以这种方式,可以通过正确地预测不采用的循环指令来增加循环预测精度。 因为循环通常在各种应用中发现,所以即使稍微增加循环预测的准确性也可能对性能有有益的影响。 循环操作在可用于执行各种数字信号处理程序和遍历数组的科学应用中尤为重要。

    Branch prediction mechanism employing branch selectors to select a
branch prediction

    公开(公告)号:US5995749A

    公开(公告)日:1999-11-30

    申请号:US752691

    申请日:1996-11-19

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F9/3806 G06F9/3844

    Abstract: A branch prediction apparatus is provided which stores multiple branch selectors corresponding to instruction bytes within a cache line of instructions or portion thereof. The branch selectors identify a branch prediction to be selected if the corresponding instruction byte is the byte indicated by the offset of the fetch address used to fetch the cache line. Instead of comparing pointers to the branch instructions with the offset of the fetch address, the branch prediction is selected simply by decoding the offset of the fetch address and choosing the corresponding branch selector. The branch prediction apparatus may operate at a higher frequencies (i.e. lower clock cycles) than if the pointers to the branch instruction and the fetch address were compared (a greater than or less than comparison). The branch selectors directly determine which branch prediction is appropriate according to the instructions being fetched, thereby decreasing the amount of logic employed to select the branch prediction.

    Branch selectors associated with byte ranges within an instruction cache
for rapidly identifying branch predictions

    公开(公告)号:US5978906A

    公开(公告)日:1999-11-02

    申请号:US957596

    申请日:1997-10-24

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F9/30054 G06F9/3806 G06F9/3844

    Abstract: A branch prediction unit stores a set of branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. Each branch selector identifies the branch prediction to be selected if a fetch address corresponding to that branch selector is presented. In order to minimize the number of branch selectors stored for a group of contiguous instruction bytes, the group is divided into multiple byte ranges. The largest byte range may include a number of bytes comprising the shortest branch instruction in the instruction set (exclusive of the return instruction). For example, the shortest branch instruction may be two bytes in one embodiment. Therefore, the largest byte range is two bytes in the example. Since the branch selectors as a group change value (i.e. indicate a different branch instruction) only at the end byte of a predicted-taken branch instruction, fewer branch selectors may be stored than the number of bytes within the group.

    Apparatus for efficiently providing memory operands for instructions
    36.
    发明授权
    Apparatus for efficiently providing memory operands for instructions 有权
    用于有效提供指令的存储器操作数的装置

    公开(公告)号:US5960467A

    公开(公告)日:1999-09-28

    申请号:US133340

    申请日:1998-08-13

    CPC classification number: G06F12/0875 G06F9/3826 G06F9/3832

    Abstract: An apparatus including address generation units, corresponding reservation stations, and a speculative register file is provided. Decode units provide memory operation information to the corresponding reservation stations while the associated instructions are being decoded. The speculative register file stores speculative register values corresponding to previously decoded instructions. The speculative register values are generated prior to execution of the previously decoded instructions. If the register operands included in the address operands of an instruction are stored in the speculative register file, then the memory operation may be passed through the corresponding reservation station to an address generation unit. The address generation unit generates the data address from the address operands and accesses a data cache while register operands corresponding to the instruction are requested from a register file and reorder buffer.

    Abstract translation: 提供了包括地址生成单元,相应的保留站和推测寄存器文件的装置。 解码单元在对相关指令进行解码的同时向对应的保留站提供存储器操作信息。 推测寄存器文件存储与先前解码的指令相对应的推测寄存器值。 在执行先前解码的指令之前生成推测寄存器值。 如果包含在指令的地址操作数中的寄存器操作数存储在推测寄存器文件中,则存储器操作可以通过相应的保留站传递给地址生成单元。 地址生成单元从地址操作数生成数据地址,并访问数据高速缓存,同时从寄存器文件和重排序缓冲器请求与指令对应的寄存器操作数。

    Method and apparatus for predecoding variable byte length instructions
for scanning of a number of RISC operations
    37.
    发明授权
    Method and apparatus for predecoding variable byte length instructions for scanning of a number of RISC operations 失效
    用于预编码可变字节长度指令以扫描多个RISC操作的方法和装置

    公开(公告)号:US5940602A

    公开(公告)日:1999-08-17

    申请号:US873114

    申请日:1997-06-11

    CPC classification number: G06F9/382 G06F9/30152 G06F9/3017 G06F9/3816

    Abstract: A superscalar microprocessor is provided that includes a predecode unit configured to predecode variable byte-length instructions prior to their storage within an instruction cache. The predecode unit is configured to generate a plurality of predecode bits for each instruction byte. The plurality of predecode bits associated with each instruction byte include an end bit and an ROP bit that indicates a number of microinstructions required to implement the instruction. The plurality of predecode bits are collectively referred to as a predecode tag. An instruction alignment unit then uses the predecode tags to identify microinstructions. The instruction alignment unit dispatches the microinstructions simultaneously to a plurality of decode units which form fixed issue positions within the superscalar microprocessor. Because the instruction alignment unit identifies microinstructions, the multiplexing of instructions from the instruction alignment unit to the decoders is simplified. Accordingly, relatively fast multiplexing may be attained, and high performance may be accommodated.

    Abstract translation: 提供了一种超标量微处理器,其包括预定解码单元,其被配置为在可变字节长度指令存储在指令高速缓存之前预解码。 预解码单元被配置为为每个指令字节生成多个预解码位。 与每个指令字节相关联的多个预解码位包括结束位和指示实现该指令所需的微指令数量的ROP位。 多个预解码比特统称为预解码标签。 指令对齐单元然后使用预解码标签来识别微指令。 指令对准单元将微指令同时分配到在超标量微处理器内形成固定发行位置的多个解码单元。 由于指令对准单元识别微指令,简化了从指令对准单元到解码器的指令的复用。 因此,可以实现相对快速的复用,并且可以适应高性能。

    Microprocessor configured to detect memory operations having data
addresses indicative of a boundary between instructions sets
    38.
    发明授权
    Microprocessor configured to detect memory operations having data addresses indicative of a boundary between instructions sets 失效
    配置为检测具有指示指令集之间的边界的数据地址的存储器操作的微处理器

    公开(公告)号:US5930489A

    公开(公告)日:1999-07-27

    申请号:US599617

    申请日:1996-02-09

    CPC classification number: G06F9/30181 G06F9/30043 G06F9/30076 G06F9/3879

    Abstract: A microprocessor configured to detect a memory operation having a predefined data address is provided. The predefined data address indicates that subsequent instructions belong to an alternate instruction set. In one embodiment, a second memory operation having the predefined data address indicates that instructions subsequent to the second memory operation belong to the original instruction set. The memory operations effectively provide a boundary between the instructions from dissimilar instruction sets. Instructions are routed to an execution unit configured to execute the instruction set indicated by the most recently detected memory operation having the predefined address. Each instruction sequence within the program may be coded using the instruction set which most efficiently executes the function corresponding to the instruction sequence. The program may be executed more quickly than an equivalent program coded entirely in either instruction set. In one embodiment, the microprocessor executes the x86 instruction set and the ADSP 2171 instruction set.

    Abstract translation: 提供了一种被配置为检测具有预定数据地址的存储器操作的微处理器。 预定义的数据地址指示后续指令属于替代指令集。 在一个实施例中,具有预定数据地址的第二存储器操作指示第二存储器操作之后的指令属于原始指令集。 存储器操作有效地提供来自不同指令集的指令之间的边界。 指令被路由到被配置为执行由具有预定义地址的最近检测到的存储器操作指示的指令集的执行单元。 可以使用最有效地执行与指令序列相对应的功能的指令集来对程序内的每个指令序列进行编码。 程序可以比完全在任一指令集中编码的等效程序执行得更快。 在一个实施例中,微处理器执行x86指令集和ADSP 2171指令集。

    Microprocessor employing local caches for functional units to store
memory operands used by the functional units
    39.
    发明授权
    Microprocessor employing local caches for functional units to store memory operands used by the functional units 失效
    微处理器采用本地缓存功能单元来存储功能单元使用的存储器操作数

    公开(公告)号:US5898849A

    公开(公告)日:1999-04-27

    申请号:US835066

    申请日:1997-04-04

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A microprocessor employs a local cache for each functional unit, located physically close to that functional unit. The local caches are relatively small as compared to a central cache optionally included in the microprocessor as well. Because the local caches are small, internal interconnection delays within the local caches may be less than those experienced by the central cache. Additionally, the physical proximity of the local cache to the functional unit which accesses the local cache reduces the interconnect delay between the local cache and the functional unit. If the memory operand hits in a remote cache (either a different local cache or the central cache), the cache line containing the memory operand is transferred to the local cache experiencing the miss. According to one embodiment including multiple symmetrical functional units, the local caches coupled to the symmetrical functional units are restricted to storing different cache lines from each other. For example, a number of bits of the tag address may be used to select which of the local caches is to store the corresponding cache line. A data prediction scheme for predicting the functional unit to which a given instruction should be dispatched may be implemented, wherein the prediction is formed based upon the cache line storing the memory operand during a previous execution of the given instruction.

    Abstract translation: 微处理器为每个功能单元采用本地高速缓存,物理上靠近该功能单元。 与可选地包括在微处理器中的中央缓存相比,本地高速缓存相对较小。 由于本地缓存很小,本地缓存内部的内部互连延迟可能小于中央缓存经历的内部互连延迟。 此外,本地缓存到接入本地高速缓存的功能单元的物理接近减少了本地高速缓存与功能单元之间的互连延迟。 如果存储器操作数在远程高速缓存(不同的本地缓存或中央高速缓存)中,则含有内存操作数的高速缓存行将传输到遇到缺失的本地缓存。 根据包括多个对称功能单元的一个实施例,耦合到对称功能单元的本地高速缓存被限制为彼此存储不同的高速缓存线。 例如,可以使用标签地址的多个比特来选择本地高速缓存中的哪一个来存储对应的高速缓存行。 可以实现用于预测应该分配给定指令的功能单元的数据预测方案,其中基于在给定指令的先前执行期间存储操作数的高速缓存行形成预测。

    Speculative register file for storing speculative register states and
removing dependencies between instructions utilizing the register
    40.
    发明授权
    Speculative register file for storing speculative register states and removing dependencies between instructions utilizing the register 失效
    用于存储推测寄存器状态的推测寄存器文件,以及消除使用寄存器的指令之间的依赖关系

    公开(公告)号:US5892936A

    公开(公告)日:1999-04-06

    申请号:US879520

    申请日:1997-06-20

    Abstract: A superscalar microprocessor configured to speculatively generate register values associated with a particular register is provided. Multiple register values are generated in parallel, wherein each speculatively generated register value accounts for modifications of the register value by each of the instructions prior to the instruction for which the register value is generated. Instructions which are dependent upon each other for the register values thus generated may be executed concurrently. In one specific embodiment, the present microprocessor generates register values for the ESP register. The speculatively generated register value resulting from the modifications performed by the instructions decoded during a clock cycle is stored in a speculative register file along with constants used to generate the register value associated with each individual instruction. When a mispredicted branch instruction is detected, the register value generated during the decode of the mispredicted branch instruction may be adjusted using the stored constants. The adjustment performed reflects the value of the register at the execution of the mispredicted branch instruction.

    Abstract translation: 提供了配置成推测性地生成与特定寄存器相关联的寄存器值的超标量微处理器。 并行产生多个寄存器值,其中每个推测产生的寄存器值在生成寄存器值的指令之前考虑每个指令对寄存器值的修改。 对于由此生成的寄存器值彼此依赖的指令可以同时执行。 在一个具体实施例中,本微处理器产生ESP寄存器的寄存器值。 由在时钟周期期间解码的指令执行的修改产生的推测产生的寄存器值与用于生成与每个单独指令相关联的寄存器值的常数一起存储在推测寄存器文件中。 当检测到错误预测的分支指令时,可以使用存储的常数来调整在误预测分支指令的解码期间生成的寄存器值。 执行的调整反映了在执行错误预测的分支指令时寄存器的值。

Patent Agency Ranking