System and method using selection logic units to define stack orders
    121.
    发明授权
    System and method using selection logic units to define stack orders 有权
    使用选择逻辑单元定义堆栈顺序的系统和方法

    公开(公告)号:US06205541B1

    公开(公告)日:2001-03-20

    申请号:US09235883

    申请日:1999-01-22

    Abstract: A floating point unit capable of executing multiple instructions in a single clock cycle using a central window and a register map is disclosed. The floating point unit comprises: a plurality of translation units, a future file, a central window, a plurality of functional units, a result queue, and a plurality of physical registers. The floating point unit receives speculative instructions, decodes them, and then stores them in the central window. Speculative top of stack values are generated for each instruction during decoding. Top of stack relative operands are computed to physical registers using a register map. Register stack exchange operations are performed during decoding. Instructions are then stored in the central window, which selects the oldest stored instructions to be issued to each functional pipeline and issues them. Conversion units convert the instruction's operands to an internal format, and normalization units detect and normalize any denormal operands. Finally, the functional pipelines execute the instructions.

    Abstract translation: 公开了一种能够使用中央窗口和寄存器映射在单个时钟周期中执行多个指令的浮点单元。 浮点单元包括:多个翻译单元,未来文件,中央窗口,多个功能单元,结果队列和多个物理寄存器。 浮点单元接收推测指令,对它们进行解码,然后将其存储在中央窗口中。 在解码过程中,每个指令产生堆栈值的推测顶点。 堆栈顶部相对操作数使用寄存器映射计算到物理寄存器。 在解码期间执行寄存器堆栈交换操作。 然后将指令存储在中央窗口中,其中选择要发布到每个功能管道的最早存储的指令并发出它们。 转换单位将指令的操作数转换为内部格式,归一化单元检测和归一化任何反常操作数。 最后,功能管线执行指令。

    Microcode scan unit for scanning microcode instructions using predecode data
    122.
    发明授权
    Microcode scan unit for scanning microcode instructions using predecode data 有权
    微码扫描单元,用于使用预解码数据扫描微码指令

    公开(公告)号:US06202142B1

    公开(公告)日:2001-03-13

    申请号:US09323301

    申请日:1999-06-01

    Abstract: An instruction scanning unit for a superscalar microprocessor is disclosed. The instruction scanning unit processes start, end, and functional byte information (or predecode data) associated with a plurality of contiguous instruction bytes. The processing of start byte information and end byte information is performed independently and in parallel, and the instruction scanning unit produces a plurality of scan values which identify valid instructions within the plurality of contiguous instruction bytes. Additionally, the instruction scanning unit is scaleable. Multiple instruction scanning units may be operated in parallel to process a larger plurality of contiguous instruction bytes. Furthermore, the instruction scanning unit detects error conditions in the predecode data in parallel with scanning to locate instructions. Moreover, in parallel with the error checking and scanning to locate instructions, MROM instructions are located for dispatch to an MROM unit.

    Abstract translation: 公开了一种用于超标量微处理器的指令扫描单元。 指令扫描单元处理与多个相邻指令字节相关联的开始,结束和功能字节信息(或预解码数据)。 开始字节信息和结束字节信息的处理是独立且并行执行的,并且指令扫描单元产生多个扫描值,该扫描值标识多个连续指令字节内的有效指令。 另外,指示扫描单元是可扩展的。 可以并行操作多个指令扫描单元以处理较大的多个相邻指令字节。 此外,指令扫描单元与扫描并行地检测预解码数据中的错误状况以定位指令。 此外,与错误检查和扫描并行定位指令,MROM指令位于调度到MROM单元。

    Superscalar microprocessor including a load/store unit, decode units and a reorder buffer to detect dependencies between access to a stack cache and a data cache
    123.
    发明授权
    Superscalar microprocessor including a load/store unit, decode units and a reorder buffer to detect dependencies between access to a stack cache and a data cache 失效
    超标量微处理器包括加载/存储单元,解码单元和重排序缓冲器,用于检测访问堆栈高速缓存和数据高速缓存之间的依赖关系

    公开(公告)号:US06192462B1

    公开(公告)日:2001-02-20

    申请号:US09162419

    申请日:1998-09-28

    Abstract: A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes. If a read memory access performed by the decode stage is dependent upon a write memory access performed by the execution stage, then the instruction associated with the read memory access and subsequent instructions are flushed. Data coherency is maintained between the pair of caches while allowing stack-relative accesses to be performed from the decode stage. The comparator circuits used to perform the comparison are configured to compare a field of address bits instead of the entire address, reducing the size while still maintaining accurate dependency checking by qualifying the resulting comparison signals with an indication that both addresses hit in the same storage location within the stack cache.

    Abstract translation: 提供了一种超标量微处理器,其保持从指令处理流水线的不同阶段访问的一对缓存之间的一致性。 在微处理器内提供依赖检查结构。 依赖性检查结构将从指令处理流水线的执行阶段执行的存储器访问与从解码级执行的存储器访问进行比较。 解码级对堆栈高速缓存执行存储器访问,而执行级通过间接寻址将其访问(通过间接寻址形成的地址)执行到堆栈高速缓存和数据高速缓存。 如果由执行级执行的读取存储器访问取决于由解码级执行的写存储器访问,则读存储器访问被停止,直到写存储器访问完成。 如果由解码级执行的读取存储器访问取决于由执行级执行的写入存储器访问,则刷新与读取的存储器访问和后续指令相关联的指令。 在一对缓存之间保持数据一致性,同时允许从解码级执行堆栈相对访问。 用于执行比较的比较器电路被配置为比较地址位的字段而不是整个地址,减小大小,同时仍然通过将所得到的比较信号限定在相同存储位置中的两个地址的指示来保持精确的依赖性检查 在堆栈缓存内。

    Dependency table for reducing dependency checking hardware
    124.
    发明授权
    Dependency table for reducing dependency checking hardware 失效
    用于减少依赖关系检查硬件的依赖关系表

    公开(公告)号:US6108769A

    公开(公告)日:2000-08-22

    申请号:US649247

    申请日:1996-05-17

    Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded. Additionally, the dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested. If a narrow-to-wide dependency is detected the instruction is stalled until the instruction indicated in the dependency table retires. Still further, the dependency table recovers from branch mispredictions and exceptions by redispatching the instructions into the dependency table.

    Abstract translation: 依赖关系表存储每个寄存器的重排序缓冲区标签。 存储的重排序缓冲器标签对应于重新排序缓冲器中的最后一个指令(以程序顺序)来更新寄存器。 否则,依赖关系表表示存储在寄存器中的值有效。 当对一组并行解码的指令执行操作数提取时,执行依赖性检查,包括检查所述一组并行解码指令之间的依赖性以及访问依赖关系表以选择其中存储的重排序缓冲器标签。 同时解码的指令之一的重排序缓冲器标签,存储在依赖关系表中的重排序缓冲器标签,与存储的重排序缓冲器标签相对应的指令结果或来自寄存器本身的值被转发作为指令的源操作数 。 来自比较器的信息和存储在依赖关系表中的信息足以选择转发哪个值。 另外,依赖关系表存储正被更新的寄存器的宽度。 在转发存储在依赖关系表内的重新排序缓冲器标签之前,将其中存储的宽度与所请求的源操作数的宽度进行比较。 如果检测到窄到宽的依赖关系,则指令停止,直到依赖关系表中指示的指令退出。 此外,依赖关系表通过将指令重新分配到依赖关系表中,从分支错误预测和异常中恢复。

    Apparatus and method for predicting a first microcode instruction of a
cache line and using predecode instruction data to identify instruction
boundaries and types
    125.
    发明授权
    Apparatus and method for predicting a first microcode instruction of a cache line and using predecode instruction data to identify instruction boundaries and types 失效
    用于预测高速缓存行的第一微代码指令并使用预解码指令数据来识别指令边界和类型的装置和方法

    公开(公告)号:US6061775A

    公开(公告)日:2000-05-09

    申请号:US989793

    申请日:1997-12-12

    Abstract: A superscalar microprocessor predecodes instruction data to identify the boundaries of instructions and the type of instruction. To expedite the dispatch of instructions, when a cache line is scanned, the first scanned instruction is predicted to be a microcode instruction and is dispatched to the MROM unit. A microcode scan circuit uses the microcode pointer and the functional bits of the predecode data to multiplex instruction specific bytes of the first microcode instruction to the MROM unit. If the predicted first microcode instruction is not the actual first microcode instruction, then in a subsequent clock cycle, the actual microcode instruction is dispatched the MROM unit and the incorrectly predicted microcode instruction is canceled.

    Abstract translation: 超标量微处理器预处理指令数据以识别指令的边界和指令的类型。 为了加快指令的发送,当扫描高速缓存行时,第一扫描指令被预测为微代码指令并且被调度到MROM单元。 微代码扫描电路使用微代码指针和预解码数据的功能位将第一微代码指令的指令特定字节复用到MROM单元。 如果预测的第一微代码指令不是实际的第一微代码指令,则在随后的时钟周期中,实际的微代码指令被分派到MROM单元,并且错误地预测的微代码指令被取消。

    Predecoding technique for indicating locations of opcode bytes in
variable byte-length instructions within a superscalar microprocessor
    126.
    发明授权
    Predecoding technique for indicating locations of opcode bytes in variable byte-length instructions within a superscalar microprocessor 失效
    用于指示超标量微处理器内可变字节长度指令中操作码字节位置的预编码技术

    公开(公告)号:US6049863A

    公开(公告)日:2000-04-11

    申请号:US873344

    申请日:1997-06-11

    Abstract: A predecode unit is configured to predecode variable byte-length instructions prior to their storage within an instruction cache of a superscalar microprocessor. The predecode unit generates three predecode bits associated with each byte of instruction code: a "start" bit, an "end" bit, and a "functional" bit. The start bit is set if the associated byte is the first byte of the instruction. Similarly, the end bit is set if the byte is the last byte of the instruction. The functional bits convey information regarding the location of an opcode byte for a particular instruction as well as an indication of whether the instruction can be decoded directly by the decode logic of the processor or whether the instruction is executed by invoking a microcode procedure controlled by an MROM unit. For fast path instructions, the functional bit is set for each prefix byte included in the instruction, and cleared for other bytes. For MROM instructions, the functional bit is cleared for each prefix byte and is set for other bytes. The type of instruction (either fast path or MROM) may thus be determined by examining the functional bit corresponding to the end byte of the instruction. If that functional bit is clear, the instruction is a fast path instruction. Conversely, if that functional bit is set, the instruction is an NMOM instruction. After an MROM instruction is identified, the functional bits for the instruction may be inverted. Subsequently, the opcode for both fast path and MROM instructions may readily be located (by the alignment logic) by determining the first byte within the instruction that has a cleared functional bit.

    Abstract translation: 预解码单元被配置为在可变字节长度指令被存储在超标量微处理器的指令高速缓存之前预先解码。 预解码单元产生与指令代码的每个字节相关联的三个预解码位:“起始”位,“结束”位和“功能”位。 如果相关字节是指令的第一个字节,则起始位置1。 类似地,如果字节是指令的最后一个字节,则结束位置1。 功能位传送关于特定指令的操作码字节的位置的信息以及指令是否可以由处理器的解码逻辑直接解码,或者指令是否通过调用由控制的微代码过程来执行 MROM单位。 对于快速通道指令,为指令中包含的每个前缀字节设置功能位,并为其他字节清零。 对于MROM指令,每个前缀字节清除功能位,并为其他字节设置。 因此,可以通过检查对应于指令的结束字节的功能位来确定指令的类型(快速路径或MROM)。 如果该功能位清零,该指令是快速路径指令。 相反,如果该功能位被置位,则指令是NMOM指令。 识别出MROM指令后,指令的功能位可能会反转。 随后,快速路径和MROM指令的操作码可以通过确定具有清除的功能位的指令内的第一个字节来容易地(由对准逻辑)定位。

    Recorder buffer and a method for allocating a fixed amount of storage
for instruction results independent of a number of concurrently
dispatched instructions
    127.
    发明授权
    Recorder buffer and a method for allocating a fixed amount of storage for instruction results independent of a number of concurrently dispatched instructions 有权
    记录器缓冲器和用于分配用于指令结果的固定量的存储器的方法,独立于多个并发调度指令

    公开(公告)号:US6026482A

    公开(公告)日:2000-02-15

    申请号:US250981

    申请日:1999-02-16

    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.

    Abstract translation: 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。 重新排序缓冲器提供当寄存器用作另一个指令的源操作数时,存储在与寄存器相对应的存储位置中的值(重新排序缓冲器标签或指令结果)。 允许访问和更新寄存器部分的微处理器的未来文件的另一个优点是,在更新较窄寄存器的指令完成后,解决了窄到宽的依赖关系。

    Reduced size storage apparatus for storing cache-line-related data in a
high frequency microprocessor
    128.
    发明授权
    Reduced size storage apparatus for storing cache-line-related data in a high frequency microprocessor 失效
    用于在高频微处理器中存储高速缓存线相关数据的减小尺寸的存储装置

    公开(公告)号:US6016545A

    公开(公告)日:2000-01-18

    申请号:US991694

    申请日:1997-12-16

    CPC classification number: G06F9/3802 G06F9/3844

    Abstract: A microprocessor stores cache-line-related data (e.g. branch predictions or predecode data, in the illustrated embodiments) in a storage which includes fewer storage locations than the number of cache lines in the instruction cache. Each storage location in the storage is mappable to multiple cache lines, any one of which can be associated with the data stored in the storage location. The storage may thereby be smaller than a storage which provides an equal number of storage locations as the number of cache lines in the instruction cache. Access time to the storage may be reduced, therefore providing for a higher frequency implementation. Still further, semiconductor substrate area occupied by the storage may be decreased. In one embodiment, the storage is indexed by a subset of the index bits used to index the instruction cache. The subset comprises the least significant bits of the cache index. In other words, the cache lines which share a particular storage location within the storage differ in the most significant cache index bits. Therefore, code which exhibits spatial locality may experience little conflict for the storage locations.

    Abstract translation: 微处理器将包含比指令高速缓存中的高速缓存行的数量更少的存储位置的存储器中的高速缓存线相关数据(例如在所示实施例中的分支预测或预解码数据)存储在存储器中。 存储器中的每个存储位置可映射到多个高速缓存行,其中任何一个可以与存储在存储位置中的数据相关联。 因此,存储器可以小于提供与指令高速缓存中的高速缓存行数相同数量的存储位置的存储器。 可能减少对存储器的访问时间,因此提供更高频率的实现。 此外,可以减少由存储器占据的半导体衬底区域。 在一个实施例中,存储器被用于索引指令高速缓存的索引位的子集索引。 该子集包括高速缓存索引的最低有效位。 换句话说,在存储器中共享特定存储位置的高速缓存行在最重要的高速缓存索引位中不同。 因此,展示空间局部性的代码可能对存储位置的冲突很小。

    Delayed update register for an array
    129.
    发明授权
    Delayed update register for an array 有权
    数组的延迟更新寄存器

    公开(公告)号:US5978907A

    公开(公告)日:1999-11-02

    申请号:US167965

    申请日:1998-10-06

    CPC classification number: G06F9/3844 G06F9/3848

    Abstract: An update unit for an array in an integrated circuit is provided. The update unit delays the update of the array until a clock cycle in which the functional input to the array is idle. The input port normally used by the functional input is then used to perform the update. During clock cycles between receiving the update and storing the update into the array, the update unit compares the current functional input address to the update address. If the current functional input address matches the update address, then the update value is provided as the output of the array. Otherwise, the information stored in the indexed storage location is provided. In this manner, the update appears to have been performed in the clock cycle that the update value was received, as in a dual-ported array. A particular embodiment of the update unit is a branch prediction array update unit. This unit is described in detail.

    Abstract translation: 提供集成电路中的阵列的更新单元。 更新单元延迟阵列的更新,直到阵列的功能输入空闲的时钟周期为止。 通常由功能输入端使用的输入端口用于执行更新。 在接收到更新并将更新存储到阵列中的时钟周期期间,更新单元将当前功能输入地址与更新地址进行比较。 如果当前功能输入地址与更新地址匹配,则更新值作为数组的输出提供。 否则,提供存储在索引存储位置中的信息。 以这种方式,更新似乎是在接收到更新值的时钟周期中执行的,如双端口阵列中那样。 更新单元的特定实施例是分支预测阵列更新单元。 详细描述了该单元。

    Reorder buffer having a future file for storing speculative instruction
execution results
    130.
    发明授权
    Reorder buffer having a future file for storing speculative instruction execution results 失效
    重新排序缓冲区具有用于存储推测性指令执行结果的将来文件

    公开(公告)号:US5961634A

    公开(公告)日:1999-10-05

    申请号:US114554

    申请日:1998-07-13

    Applicant: Thang M. Tran

    Inventor: Thang M. Tran

    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor. The reorder buffer tag (or instruction result, if the instruction has executed) of the last instruction in program order to update the register is stored in the future file. The reorder buffer provides the value (either reorder buffer tag or instruction result) stored in the storage location corresponding to a register when the register is used as a source operand for another instruction. Another advantage of the future file for microprocessors which allow access and update to portions of registers is that narrow-to-wide dependencies are resolved upon completion of the instruction which updates the narrower register.

    Abstract translation: 重排序缓冲器被配置成多个存储线,其中存储线包括关于预定的最大数量的可同时分发的指令的指令结果的足够的存储。 只要调度一个或多个指令,就分配一行存储空间。 采用重排序缓冲器的微处理器也配置有固定的对称发布位置。 问题位置的对称性质可能会增加由微处理器同时调度和执行的指令的平均数量。 随着并发调度指令的平均数量的增加,行中未使用位置的平均数量减少。 重排序缓冲器的一个特定实现包括将来的文件。 未来文件包括与微处理器内的每个寄存器对应的存储位置。 程序顺序中的最后一条指令的重新排序缓冲区标签(或指令结果已执行)更新寄存器存储在将来的文件中。 重新排序缓冲器提供当寄存器用作另一个指令的源操作数时,存储在与寄存器相对应的存储位置中的值(重新排序缓冲器标签或指令结果)。 允许访问和更新寄存器部分的微处理器的未来文件的另一个优点是,在更新较窄寄存器的指令完成后,解决了窄到宽的依赖关系。

Patent Agency Ranking