Apparatus and method for modifying status bits in a reorder buffer with
a large speculative state
    131.
    发明授权
    Apparatus and method for modifying status bits in a reorder buffer with a large speculative state 失效
    用于在具有大的推测状态的重排序缓冲器中修改状态位的装置和方法

    公开(公告)号:US5920710A

    公开(公告)日:1999-07-06

    申请号:US751649

    申请日:1996-11-18

    CPC classification number: G06F9/3842 G06F9/3861

    Abstract: A superscalar microprocessor implements a reorder buffer to support out-of-order execution of instructions. To reduce the time delay for identifying mispredicted instructions, prioritizing mispredicted instructions, canceling instructions subsequent to the mispredicted instruction and reading status information from the reorder buffer, the availability of an instruction tag, which identifies the instruction being executed, during the execution of the instruction is utilized. The reorder buffer receives the tag of the instruction issued to the functional unit. In parallel with the execution of the instruction, the reorder buffer generates hit masks identifying instructions to be canceled in the event of a mispredicted branch. In parallel, status information from the instruction (or instructions) being executed is selected from the reorder buffer and prioritization masks are generated. Therefore, if a mispredicted branch is detected, the instructions that need to be canceled can be readily identified and the instruction status information is readily available.

    Abstract translation: 超标量微处理器实现重排序缓冲器,以支持指令的无序执行。 为了减少用于识别误预测指令的时间延迟,对误预测指令进行优先排序,取消误预测指令之后的指令并从重排序缓冲器读取状态信息,在执行指令期间识别正在执行的指令的指令标签的可用性 被利用。 重新排序缓冲器接收发给功能单元的指令标签。 与执行指令并行,重排序缓冲器产生命中掩码,以在错误预测的分支的情况下识别要被取消的指令。 并行地,从重排序缓冲器中选择正在执行的指令(或指令)的状态信息,并生成优先化掩码。 因此,如果检测到错误预测的分支,则可以容易地识别需要取消的指令,并且可以容易地获得指令状态信息。

    Search mechanism for a rotating pointer buffer
    132.
    发明授权
    Search mechanism for a rotating pointer buffer 失效
    旋转指针缓冲区的搜索机制

    公开(公告)号:US5919251A

    公开(公告)日:1999-07-06

    申请号:US962810

    申请日:1997-11-03

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F5/10

    Abstract: A first hit scanning circuit scans hit signals identifying entries within a rotating buffer which are storing a search value for a first of the hit signals which is nearest a start pointer which identifies one of the entries. The first hit scanning circuit divides the hit signals into multiple subsets, and independently scans each subset for a first hit within the subset. In parallel, the first hit scanning circuit generates a set of lookahead signals by scanning each subset for at least one hit. The lookahead signals are then scanned for a first lookahead signal, and the scanned subset signals are qualified with the scanned lookahead signals.

    Abstract translation: 第一命中扫描电路扫描识别旋转缓冲器内的条目的命中信号,旋转缓冲器正在存储对于识别条目中的一个的起始指针最近的第一命中信号的搜索值。 第一命中扫描电路将命中信号划分成多个子集,并且独立地扫描每个子集以进行子集内的第一次命中。 并行地,第一命中扫描电路通过扫描至少一个命中的每个子集来产生一组前视信号。 然后扫描先行信号以获得第一个前瞻信号,并且扫描的子集信号用扫描的前视信号进行限定。

    Multi-chip superscalar microprocessor module
    133.
    发明授权
    Multi-chip superscalar microprocessor module 失效
    多芯片超标量微处理器模块

    公开(公告)号:US5909587A

    公开(公告)日:1999-06-01

    申请号:US957085

    申请日:1997-10-24

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: The pipeline of a microprocessor is partitioned near its mid point such that a first portion of the functionality of the microprocessor is implemented on a first integrated circuit chip and a second portion of the microprocessor functionality is implemented on a second integrated circuit chip. In one implementation, the first integrated circuit chip includes an instruction cache, an instruction alignment unit, and a plurality of decode units for implementing fetch, alignment and decode stages, respectively, of the processor pipeline. Instructions are selected from the instruction cache by the instruction alignment unit and are provided to a respective decode unit. A compression unit may compress the information output by the decode units to prepare conveyance of the information from the first integrated chip to the second integrated circuit chip. The second integrated circuit chip contains circuitry to implement execute and write-back stages of the processor pipeline. This circuitry may include a plurality of execution units coupled to receive output signals from the decoders of the first integrated circuit chip, corresponding reservation stations, a load/store unit and a data cache. A decompression unit may be coupled to receive the compressed information from the compression unit of the first integrated circuit chip to decompress the information prior to providing it to the reservation stations and/or execution units.

    Abstract translation: 微处理器的流水线在其中点附近被分配,使得微处理器的功能的第一部分被实现在第一集成电路芯片上,并且微处理器功能的第二部分被实现在第二集成电路芯片上。 在一个实现中,第一集成电路芯片包括指令高速缓存,指令对准单元和用于分别实现处理器流水线的提取,对准和解码阶段的多个解码单元。 通过指令对准单元从指令高速缓存中选择指令,并将其提供给相应的解码单元。 压缩单元可以压缩由解码单元输出的信息,以准备将信息从第一集成芯片传送到第二集成电路芯片。 第二集成电路芯片包含用于实现处理器管线的执行和回写阶段的电路。 该电路可以包括多个执行单元,其被耦合以从第一集成电路芯片的解码器,对应的保留站,加载/存储单元和数据高速缓存接收输出信号。 解耦单元可以被耦合以从第一集成电路芯片的压缩单元接收压缩信息,以在将信息提供给保留站和/或执行单元之前解压缩信息。

    Method of allocating a fixed reorder buffer storage line for execution
results regardless of a number of concurrently dispatched instructions
    134.
    发明授权
    Method of allocating a fixed reorder buffer storage line for execution results regardless of a number of concurrently dispatched instructions 失效
    分配固定重排序缓冲存储线用于执行结果的方法,而不管多个并发分派的指令

    公开(公告)号:US5903741A

    公开(公告)日:1999-05-11

    申请号:US690383

    申请日:1996-07-26

    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.

    Abstract translation: 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。 重新排序缓冲器提供当寄存器用作另一个指令的源操作数时,存储在与寄存器相对应的存储位置中的值(重新排序缓冲器标签或指令结果)。 允许访问和更新寄存器部分的微处理器的未来文件的另一个优点是,在更新较窄寄存器的指令完成后,解决了窄到宽的依赖关系。

    Storage device having varying access times and a superscalar
microprocessor employing the same
    135.
    发明授权
    Storage device having varying access times and a superscalar microprocessor employing the same 失效
    具有不同访问时间的存储设备和使用该存储设备的超标量微处理器

    公开(公告)号:US5900012A

    公开(公告)日:1999-05-04

    申请号:US933270

    申请日:1997-09-18

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    CPC classification number: G06F12/0844 G06F12/0864 G06F12/128

    Abstract: A storage device having varying access times is provided. The storage device incorporates a direct-mapped cache and a set-associative cache, which are accessed in parallel. If a hit occurs in the direct-mapped cache, then the data is forwarded in the same clock cycle as the requested address is conveyed to the storage device. If a hit occurs in the set-associative cache, then the data is forwarded in a subsequent clock cycle and the associated cache line is moved into the direct-mapped cache. The cache line stored in the direct-mapped cache in the storage location that is to be used for the cache line being moved is stored into the set-associative cache in the location vacated by the moved line. In this manner, the most recently accessed cache line is stored in the direct-mapped cache and other recently accessed cache lines are stored in the set-associative cache.

    Abstract translation: 提供具有不同访问时间的存储设备。 存储设备包括并行访问的直接映射缓存和集合关联高速缓存。 如果在直接映射高速缓存中发生命中,则数据在所请求的地址被传送到存储设备的相同时钟周期中转发。 如果在集合关联高速缓存中发生命中,则在随后的时钟周期中转发数据,并将相关联的高速缓存行移动到直接映射高速缓存中。 存储在要用于被移动的高速缓存行的存储位置中的直接映射高速缓存中的高速缓存行被存储到由移动的行空出的位置中的关联高速缓存中。 以这种方式,最近访问的高速缓存行被存储在直接映射高速缓存中,并且其他最近访问的高速缓存行被存储在集合关联高速缓存中。

    Reorder buffer having a future file for storing speculative instruction
execution results
    136.
    发明授权
    Reorder buffer having a future file for storing speculative instruction execution results 失效
    重新排序缓冲区具有用于存储推测性指令执行结果的将来文件

    公开(公告)号:US5872951A

    公开(公告)日:1999-02-16

    申请号:US690370

    申请日:1996-07-26

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file.

    Abstract translation: 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。

    Microprocessor configured to selectively invoke a microcode DSP function
or a program subroutine in response to a target address value of branch
instruction
    137.
    发明授权
    Microprocessor configured to selectively invoke a microcode DSP function or a program subroutine in response to a target address value of branch instruction 失效
    配置为响应于分支指令的目标地址值选择性地调用微代码DSP功能或程序子程序的微处理器

    公开(公告)号:US5864689A

    公开(公告)日:1999-01-26

    申请号:US567666

    申请日:1995-12-05

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A microprocessor having a microcode unit is provided. Routines comprising DSP functions and instruction emulation routines are stored within a read-only memory within the microcode unit. The routines may be fetched by the microprocessor upon occurrence of a corresponding instruction. For example, DSP functions may be fetched upon occurrence of an instruction defined by the microprocessor to be indicative of a DSP function. The microcode unit provides a library of useful functions. Effectively, the instruction set executed by the microprocessor is increased. A number of methods for defining instructions indicative of a DSP function are contemplated. For example, a subroutine call instruction having a target address within a predefined range of addresses may be defined as indicative of a DSP function. Alternatively, a special subroutine call instruction may be added to the instruction set. Detection of the special subroutine call instruction encoding causes the microprocessor to fetch instructions from the microcode unit. A third alternative is to detect data patterns in data movement instructions and cause instructions to be fetched from the microcode unit upon occurrence of particular data patterns.

    Abstract translation: 提供具有微码单元的微处理器。 包含DSP功能和指令仿真程序的程序存储在微代码单元内的只读存储器中。 当发生相应的指令时,该程序可以被微处理器取出。 例如,在发生由微处理器定义的指令以指示DSP功能时,可以取得DSP功能。 微代码单元提供了一个有用的功能库。 有效地,由微处理器执行的指令集增加。 考虑了用于定义指示DSP功能的指令的多种方法。 例如,具有预定义地址范围内的目标地址的子程序调用指令可以被定义为DSP功能的指示。 或者,可以向指令集添加特殊的子程序调用指令。 特殊子程序调用指令编码的检测使得微处理器从微码单元获取指令。 第三种替代方案是检测数据移动指令中的数据模式,并且在特定数据模式发生时使指令从微代码单元获取。

    Superscalar microprocessor including a reorder buffer which detects
dependencies between accesses to a pair of caches
    138.
    发明授权
    Superscalar microprocessor including a reorder buffer which detects dependencies between accesses to a pair of caches 失效
    超标量微处理器包括重新排序缓冲器,其检测对一对高速缓存的访问之间的依赖性

    公开(公告)号:US5848287A

    公开(公告)日:1998-12-08

    申请号:US603804

    申请日:1996-02-20

    Abstract: A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes. If a read memory access performed by the decode stage is dependent upon a write memory access performed by the execution stage, then the instruction associated with the read memory access and subsequent instructions are flushed. Data coherency is maintained between the pair of caches while allowing stack-relative accesses to be performed from the decode stage. The comparator circuits used to perform the comparison are configured to compare a field of address bits instead of the entire address, reducing the size while still maintaining accurate dependency checking by qualifying the resulting comparison signals with an indication that both addresses hit in the same storage location within the stack cache.

    Abstract translation: 提供了一种超标量微处理器,其保持从指令处理流水线的不同阶段访问的一对缓存之间的一致性。 在微处理器内提供依赖检查结构。 依赖性检查结构将从指令处理流水线的执行阶段执行的存储器访问与从解码级执行的存储器访问进行比较。 解码级对堆栈高速缓存执行存储器访问,而执行级通过间接寻址将其访问(通过间接寻址形成的地址)执行到堆栈高速缓存和数据高速缓存。 如果由执行级执行的读取存储器访问取决于由解码级执行的写存储器访问,则读存储器访问被停止,直到写存储器访问完成。 如果由解码级执行的读取存储器访问取决于由执行级执行的写入存储器访问,则刷新与读取的存储器访问和后续指令相关联的指令。 在一对缓存之间保持数据一致性,同时允许从解码级执行堆栈相对访问。 用于执行比较的比较器电路被配置为比较地址位的字段而不是整个地址,减小大小,同时仍然通过将所得到的比较信号限定在相同存储位置中的两个地址的指示来保持精确的依赖性检查 在堆栈缓存内。

    Prefetch buffer for storing instructions prior to placing the
instructions in an instruction cache
    139.
    发明授权
    Prefetch buffer for storing instructions prior to placing the instructions in an instruction cache 失效
    用于在将指令放置在指令高速缓存之前存储指令的预取缓冲器

    公开(公告)号:US5845101A

    公开(公告)日:1998-12-01

    申请号:US855099

    申请日:1997-05-13

    CPC classification number: G06F9/382 G06F9/30152 G06F9/3816

    Abstract: A microprocessor is configured to speculatively fetch cache lines of instruction bytes prior to actually detecting a cache miss for the cache lines of instruction bytes. The bytes transferred from an external main memory subsystem are stored into one of several prefetch buffers. Subsequently, instruction fetches may be detected which hit the prefetch buffers. Furthermore, predecode data may be generated for the instruction bytes stored in the prefetch buffers. When a fetch hit in the prefetch buffers is detected, predecode data may be available for the instructions being fetched. The prefetch buffers may each comprise an address prefetch buffer included within an external interface unit and an instruction data prefetch buffer included within a prefetch/predecode unit. The external interface unit maintains the addresses of cache lines assigned to the prefetch buffers in the address prefetch buffers. Both the linear address and the physical address of each cache line is maintained. The prefetch/predecode unit receives instruction bytes directly from the external interface and stores the instruction bytes in the corresponding instruction data prefetch buffer.

    Abstract translation: 微处理器被配置为在实际检测到指令字节的高速缓存行的高速缓存未命中之前推测性地提取指令字节的高速缓存行。 从外部主存储器子系统传送的字节存储在几个预取缓冲器之一中。 随后,可以检测到命中提取缓冲器的指令提取。 此外,可以为存储在预取缓冲器中的指令字节产生预解码数据。 当检测到预取缓冲器中的提取命中时,预解码数据可能对于正在获取的指令可用。 预取缓冲器可以各自包括包含在外部接口单元内的地址预取缓冲器和包含在预取/预解码单元内的指令数据预取缓冲器。 外部接口单元维护分配给地址预取缓冲器中的预取缓冲器的高速缓存线的地址。 保持每个高速缓存行的线性地址和物理地址。 预取/预解码单元直接从外部接口接收指令字节,并将指令字节存储在相应的指令数据预取缓冲器中。

    Superscalar microprocessor load/store unit employing a unified buffer
and separate pointers for load and store operations
    140.
    发明授权
    Superscalar microprocessor load/store unit employing a unified buffer and separate pointers for load and store operations 失效
    超标量微处理器加载/存储单元采用统一的缓冲区和单独的指针进行加载和存储操作

    公开(公告)号:US5832297A

    公开(公告)日:1998-11-03

    申请号:US968308

    申请日:1997-11-12

    Abstract: A load/store buffer is provided which allows both load memory operations and store memory operations to be stored within it. Because each storage location may contain either a load or a store memory operation, the number of available storage locations for load memory operations is maximally the number of storage locations in the entire buffer. Similarly, the number of available storage locations for store memory operations is maximally the number of storage locations in the entire buffer. This invention improves use of silicon area for load and store buffers by implementing, in a smaller area, a performance-equivalent alternative to the separate load and store buffer approach previously used in many superscalar microprocessors.

    Abstract translation: 提供了一个加载/存储缓冲区,允许加载存储器操作和存储存储器操作存储在其中。 由于每个存储位置可以包含加载或存储存储器操作,所以用于加载存储器操作的可用存储位置的数量最大限度地为整个缓冲器中的存储位置的数量。 类似地,用于存储存储器操作的可用存储位置的数量最大限度地是整个缓冲器中的存储位置的数量。 本发明通过在较小的区域中实现与许多超标量微处理器中先前使用的单独的负载和存储缓冲器方法的性能等效的替代方案来改进硅面积用于负载和存储缓冲器的使用。

Patent Agency Ranking