Cache holding register for receiving instruction packets and for
providing the instruction packets to a predecode unit and instruction
cache
    1.
    发明授权
    Cache holding register for receiving instruction packets and for providing the instruction packets to a predecode unit and instruction cache 失效
    缓存保持寄存器,用于接收指令包,并将指令包提供给预解码单元和指令高速缓存

    公开(公告)号:US5983321A

    公开(公告)日:1999-11-09

    申请号:US815567

    申请日:1997-03-12

    摘要: An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.

    摘要翻译: 提供采用高速缓存保持寄存器的指令高速缓存器。 当从主存储器取出指令字节的高速缓存行时,指令字节从主存储器接收时临时存储到高速缓存保持寄存器中。 指令字节是从主存储器接收到的预解码的。 如果遇到预测的分支指令,则指令高速缓存内的指令获取机制开始从目标指令路径获取指令。 可以在接收到包含预测的分支指令的完整高速缓存行之前启动该获取。 只要从目标指令路径获取的指令继续命中指令高速缓存,可以将这些指令提取并分派到采用指令高速缓存的微处理器中。 由高速缓存保持寄存器接收包含预测的分支指令的指令字节的高速缓存行的剩余部分。 为了减少用于存储高速缓存行指令的指令字节存储器所使用的端口数量,高速缓存保持寄存器保持高速缓存行直到在指令字节存储器中发生空闲周期。 通常用于提取指令的相同端口用于将高速缓存行存储到指令字节存储器中。 在一个实施例中,指令高速缓存将后续的高速缓存行预取到丢失的高速缓存行。 采用第二高速缓存保存寄存器来存储预取的高速缓存行。

    Cache holding register for delayed update of a cache line into an
instruction cache
    2.
    发明授权
    Cache holding register for delayed update of a cache line into an instruction cache 失效
    缓存保持寄存器用于将高速缓存行的延迟更新延迟到指令高速缓存

    公开(公告)号:US6076146A

    公开(公告)日:2000-06-13

    申请号:US310356

    申请日:1999-05-12

    摘要: An instruction cache employing a cache holding register is provided. When a cache line of instruction bytes is fetched from main memory, the instruction bytes are temporarily stored into the cache holding register as they are received from main memory. The instruction bytes are predecoded as they are received from the main memory. If a predicted-taken branch instruction is encountered, the instruction fetch mechanism within the instruction cache begins fetching instructions from the target instruction path. This fetching may be initiated prior to receiving the complete cache line containing the predicted-taken branch instruction. As long as instruction fetches from the target instruction path continue to hit in the instruction cache, these instructions may be fetched and dispatched into a microprocessor employing the instruction cache. The remaining portion of the cache line of instruction bytes containing the predicted-taken branch instruction is received by the cache holding register. In order to reduce the number of ports employed upon the instruction bytes storage used to store cache lines of instructions, the cache holding register retains the cache line until an idle cycle occurs in the instruction bytes storage. The same port ordinarily used for fetching instructions is then used to store the cache line into the instruction bytes storage. In one embodiment, the instruction cache prefetches a succeeding cache line to the cache line which misses. A second cache holding register is employed for storing the prefetched cache line.

    摘要翻译: 提供采用高速缓存保持寄存器的指令高速缓存器。 当从主存储器取出指令字节的高速缓存行时,指令字节从主存储器接收时临时存储到高速缓存保持寄存器中。 指令字节是从主存储器接收到的预解码的。 如果遇到预测的分支指令,则指令高速缓存内的指令获取机制开始从目标指令路径获取指令。 可以在接收到包含预测的分支指令的完整高速缓存行之前启动该获取。 只要从目标指令路径获取的指令继续命中指令高速缓存,可以将这些指令提取并分派到采用指令高速缓存的微处理器中。 由高速缓存保持寄存器接收包含预测的分支指令的指令字节的高速缓存行的剩余部分。 为了减少用于存储高速缓存行指令的指令字节存储器所使用的端口数量,高速缓存保持寄存器保持高速缓存行直到在指令字节存储器中发生空闲周期。 通常用于提取指令的相同端口用于将高速缓存行存储到指令字节存储器中。 在一个实施例中,指令高速缓存将后续的高速缓存行预取到丢失的高速缓存行。 采用第二高速缓存保存寄存器来存储预取的高速缓存行。

    Predecoding technique for indicating locations of opcode bytes in
variable byte-length instructions within a superscalar microprocessor
    3.
    发明授权
    Predecoding technique for indicating locations of opcode bytes in variable byte-length instructions within a superscalar microprocessor 失效
    用于指示超标量微处理器内可变字节长度指令中操作码字节位置的预编码技术

    公开(公告)号:US6049863A

    公开(公告)日:2000-04-11

    申请号:US873344

    申请日:1997-06-11

    IPC分类号: G06F9/30 G06F9/318 G06F9/38

    摘要: A predecode unit is configured to predecode variable byte-length instructions prior to their storage within an instruction cache of a superscalar microprocessor. The predecode unit generates three predecode bits associated with each byte of instruction code: a "start" bit, an "end" bit, and a "functional" bit. The start bit is set if the associated byte is the first byte of the instruction. Similarly, the end bit is set if the byte is the last byte of the instruction. The functional bits convey information regarding the location of an opcode byte for a particular instruction as well as an indication of whether the instruction can be decoded directly by the decode logic of the processor or whether the instruction is executed by invoking a microcode procedure controlled by an MROM unit. For fast path instructions, the functional bit is set for each prefix byte included in the instruction, and cleared for other bytes. For MROM instructions, the functional bit is cleared for each prefix byte and is set for other bytes. The type of instruction (either fast path or MROM) may thus be determined by examining the functional bit corresponding to the end byte of the instruction. If that functional bit is clear, the instruction is a fast path instruction. Conversely, if that functional bit is set, the instruction is an NMOM instruction. After an MROM instruction is identified, the functional bits for the instruction may be inverted. Subsequently, the opcode for both fast path and MROM instructions may readily be located (by the alignment logic) by determining the first byte within the instruction that has a cleared functional bit.

    摘要翻译: 预解码单元被配置为在可变字节长度指令被存储在超标量微处理器的指令高速缓存之前预先解码。 预解码单元产生与指令代码的每个字节相关联的三个预解码位:“起始”位,“结束”位和“功能”位。 如果相关字节是指令的第一个字节,则起始位置1。 类似地,如果字节是指令的最后一个字节,则结束位置1。 功能位传送关于特定指令的操作码字节的位置的信息以及指令是否可以由处理器的解码逻辑直接解码,或者指令是否通过调用由控制的微代码过程来执行 MROM单位。 对于快速通道指令,为指令中包含的每个前缀字节设置功能位,并为其他字节清零。 对于MROM指令,每个前缀字节清除功能位,并为其他字节设置。 因此,可以通过检查对应于指令的结束字节的功能位来确定指令的类型(快速路径或MROM)。 如果该功能位清零,该指令是快速路径指令。 相反,如果该功能位被置位,则指令是NMOM指令。 识别出MROM指令后,指令的功能位可能会反转。 随后,快速路径和MROM指令的操作码可以通过确定具有清除的功能位的指令内的第一个字节来容易地(由对准逻辑)定位。

    Three state branch history using one bit in a branch prediction mechanism
    4.
    发明授权
    Three state branch history using one bit in a branch prediction mechanism 有权
    三州分支历史在分支预测机制中使用一位

    公开(公告)号:US06253316B1

    公开(公告)日:2001-06-26

    申请号:US09438963

    申请日:1999-11-12

    IPC分类号: G06F944

    摘要: A branch prediction unit stores a set of branch prediction history bits and branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. While only one bit is used to represent branch prediction history, three distinct states are represented in conjunction with the absence of a branch prediction. This provides for the storage of fewer bits, while maintaining a high degree of branch prediction accuracy. Each branch selector identifies the branch prediction to be selected if a fetch address corresponding to that branch selector is presented. In order to minimize the number of branch selectors stored for a group of contiguous instruction bytes, the group is divided into multiple byte ranges. The largest byte range may include a number of bytes comprising the shortest branch instruction in the instruction set (exclusive of the return instruction). For example, the shortest branch instruction may be two bytes in one embodiment. Therefore, the largest byte range is two bytes in the example. Since the branch selectors as a group change value (i.e. indicate a different branch instruction) only at the end byte of a predicted-taken branch instruction, fewer branch selectors may be stored than the number of bytes within the group.

    摘要翻译: 分支预测单元存储对应于存储在指令高速缓存中的一组连续指令字节中的每一个分支预测历史位和分支选择器的集合。 虽然仅使用一个比特来表示分支预测历史,但是与不存在分支预测一起表示三个不同的状态。 这提供了存储较少位,同时保持高度的分支预测精度。 如果呈现与该分支选择器相对应的获取地址,则每个分支选择器识别要选择的分支预测。 为了最小化一组连续指令字节存储的分支选择器的数量,该组被划分为多个字节范围。 最大字节范围可以包括包括指令集中的最短分支指令(不包括返回指令)的字节数。 例如,在一个实施例中,最短分支指令可以是两个字节。 因此,在该示例中,最大字节范围是两个字节。 由于分支选择器作为组改变值(即指示不同的分支指令)仅在预测的分支指令的结束字节处,所以可以存储比组内的字节数少的分支选择器。

    Branch selector prediction
    5.
    发明授权
    Branch selector prediction 失效
    分支选择器预测

    公开(公告)号:US5954816A

    公开(公告)日:1999-09-21

    申请号:US972988

    申请日:1997-11-19

    IPC分类号: G06F9/38

    摘要: A branch prediction unit includes a branch prediction entry corresponding to a group of contiguous instruction bytes. The branch prediction entry stores branch predictions corresponding to branch instructions within the group of contiguous instruction bytes. Additionally, the branch prediction entry stores a set of branch selectors corresponding to the group of contiguous instruction bytes. The branch selectors identify which branch prediction is to be selected if the corresponding byte (or bytes) is selected by the offset portion of the fetch address. Still further, a predicted branch selector is stored. The predicted branch selector is used to select a branch prediction for forming the fetch address. In parallel, a selected branch selector is selected from the set of branch selectors. The predicted branch selector is verified using the selected branch selector. If the selected branch selector and the predicted branch selector mismatch, the correct branch prediction is generated and the predicted branch selector is updated to indicate the selected branch selector.

    摘要翻译: 分支预测单元包括对应于一组相邻指令字节的分支预测条目。 分支预测条目存储对应于连续指令字节组内的分支指令的分支预测。 此外,分支预测条目存储对应于该组连续指令字节的一组分支选择器。 如果通过提取地址的偏移部分选择了相应的字节(或字节),则分支选择器识别要选择哪个分支预测。 此外,存储预测分支选择器。 预测分支选择器用于选择用于形成取出地址的分支预测。 并行地,从分支选择器组中选择选择的分支选择器。 使用所选择的分支选择器验证预测分支选择器。 如果所选择的分支选择器和预测分支选择器不匹配,则生成正确的分支预测,并且更新预测分支选择器以指示所选择的分支选择器。

    Dual comparator scheme for detecting a wrap-around condition and
generating a cancel signal for removing wrap-around buffer entries
    6.
    发明授权
    Dual comparator scheme for detecting a wrap-around condition and generating a cancel signal for removing wrap-around buffer entries 失效
    用于检测环绕条件并产生用于去除环绕缓冲器条目的取消信号的双比较器方案

    公开(公告)号:US5900013A

    公开(公告)日:1999-05-04

    申请号:US690381

    申请日:1996-07-26

    IPC分类号: G06F5/10 G06F9/38 G06F12/02

    摘要: A device and method for comparing cancel tags, and for canceling data from a finite wrap-around data buffer. The data buffer stores tag values that are continuous, or sequential. A cancel tag is used to cancel all tags with a value "greater-than" the cancel tag. In comparing cancel tags of a wrap-around buffer, however, the comparator must take into account wrap-around conditions. When a wrap-around condition occurs, tags that have a lower value may be "greater-than" the cancel tag. The present invention advantageously adds an additional bit to the tags stored in the data buffer and the cancel tag. The additional bit is toggled whenever a wrap-around condition occurs. By comparing the additional bit of the tag to the additional bit of the cancel tag, a wrap-around condition can be detected without extensive additional circuitry. The comparison of the additional bit indicates whether the comparator should cancel tags that are greater-than or less-than the cancel tag. The cancel tag causes the buffer pointer to change state and point to the storage element associated with the cancel tag, and causes the tag generator to change state.

    摘要翻译: 用于比较取消标签,以及用于从有限环绕数据缓冲器中取消数据的装置和方法。 数据缓冲区存储连续或连续的标签值。 取消标签用于取消值大于“取消”标签的所有标签。 然而,在比较环绕缓冲区的取消标签时,比较器必须考虑环绕条件。 当环绕条件发生时,具有较低值的标签可能大于“cancel”标签。 本发明有利地将附加位添加到存储在数据缓冲器和取消标签中的标签。 每当环绕条件发生时,额外的位将被切换。 通过将标签的附加位与取消标签的附加位进行比较,可以在没有广泛的附加电路的情况下检测环绕条件。 附加位的比较表示比较器是否应取消大于或小于取消标签的标签。 取消标签使缓冲区指针改变状态并指向与取消标签相关联的存储元件,并使标签生成器改变状态。

    Method for optimizing loop control of microcoded instructions
    7.
    发明授权
    Method for optimizing loop control of microcoded instructions 有权
    优化微编码指令循环控制的方法

    公开(公告)号:US07366885B1

    公开(公告)日:2008-04-29

    申请号:US10858791

    申请日:2004-06-02

    IPC分类号: G06F9/00

    摘要: A method for optimizing loop control of microcoded instructions includes identifying an instruction as a repetitive microcode instruction such as a move string instruction, for example, having a repeat prefix. The repetitive microcode instruction may include a loop of microcode instructions forming a microcode sequence. The microcode sequence is stored within a storage of a microcode unit. The method also includes storing a loop count value associated with the repetitive microcode instruction to a sequence control unit of the microcode unit. The method further includes determining a number of iterations to issue the microcode sequence for execution by an instruction pipeline based upon the loop count value. In response to receiving the repetitive microcode instruction, the method includes continuously issuing the microcode sequence for the number of iterations.

    摘要翻译: 用于优化微编码指令的循环控制的方法包括将指令识别为重复微代码指令,例如移动串指令,例如具有重复前缀。 重复的微代码指令可以包括形成微码序列的微代码指令的循环。 微代码序列存储在微代码单元的存储器中。 该方法还包括将与重复微代码指令相关联的循环计数值存储到微代码单元的序列控制单元。 该方法还包括确定一个迭代次数,以便基于循环计数值来发出指令流水线执行的微代码序列。 响应于接收到重复的微代码指令,该方法包括连续发出迭代次数的微代码序列。

    Branch prediction with added selector bits to increase branch prediction
capacity and flexibility with minimal added bits
    8.
    发明授权
    Branch prediction with added selector bits to increase branch prediction capacity and flexibility with minimal added bits 失效
    分支预测与添加的选择器位,以增加分支预测能力和灵活性与最小添加位

    公开(公告)号:US6108774A

    公开(公告)日:2000-08-22

    申请号:US994869

    申请日:1997-12-19

    IPC分类号: G06F9/38

    摘要: A branch prediction unit stores as set of branch selectors corresponding to each of a group of contiguous instruction bytes stored in an instruction cache. Each branch selector identifies a branch prediction to be selected if a fetch address corresponding to that branch selector is presented. The branch prediction unit additionally stores a set of return selectors corresponding to one or more branch predictions. The return selectors identify the type of branch selection. For example, the branch predictions may include a sequential branch prediction and a branch instruction branch prediction. The return selectors may identify whether the branch instruction branch prediction is associated with the return instruction or a non-return branch instruction.

    摘要翻译: 分支预测单元存储与存储在指令高速缓存中的一组连续指令字节中的每一个相对应的分支选择器集合。 如果呈现与该分支选择器相对应的获取地址,则每个分支选择器识别要选择的分支预测。 分支预测单元另外存储对应于一个或多个分支预测的一组返回选择器。 返回选择器标识分支选择的类型。 例如,分支预测可以包括顺序分支预测和分支指令分支预测。 返回选择器可以识别分支指令分支预测是否与返回指令或不返回分支指令相关联。

    Method and circuit for fast generation of zero flag condition code in a
microprocessor-based computer
    9.
    发明授权
    Method and circuit for fast generation of zero flag condition code in a microprocessor-based computer 失效
    在基于微处理器的计算机中快速生成零标志条件代码的方法和电路

    公开(公告)号:US5862065A

    公开(公告)日:1999-01-19

    申请号:US799452

    申请日:1997-02-13

    摘要: An adder circuit in parallel with a zero flag generation circuit. In a preferred embodiment, an arithmetic logic unit (ALU) circuit in a microprocessor based computer system includes an adder circuit preferably adapted to receive first and second operands. The preferred adder circuit is further adapted to produce a result equal to the sum of the first and second operands. The ALU circuit further includes a zero flag generation circuit. The zero flag generation circuit is adapted to receive the first and second operands in parallel with the adder circuit and to produce a zero flag signal in response to the operands. The zero flag signal is indicative of whether the sum of the operands is equal to zero. In one embodiment, the zero flag generation circuit includes N half adders in parallel wherein each adder receives a bit from the first operand and a corresponding bit from the second operand. Each half adder produces a sum bit and a carry bit in response to the inputs. Preferably, the zero flag generation circuit further includes N-1 Exclusive OR (EXOR) gates. Each of the N-1 EXOR gates receives one bit of the N sum bits and a corresponding bit of the N carry bits as inputs. The N-1 outputs from the EXOR gates, together with an inverted least significant sum bit, are routed to a logic circuit. The logic circuit functions as an AND gate, producing an output signal indicative of whether each input signal is equal to 1.

    摘要翻译: 与零标志生成电路并联的加法器电路。 在一个优选实施例中,在基于微处理器的计算机系统中的算术逻辑单元(ALU)电路包括一个适于接收第一和第二操作数的加法器电路。 优选的加法器电路还适于产生等于第一和第二操作数之和的结果。 ALU电路还包括零标志生成电路。 零标志生成电路适于与加法器电路并行地接收第一和第二操作数,并且响应于操作数产生零标志信号。 零标志信号指示操作数的和是否等于零。 在一个实施例中,零标志生成电路包括并行的N个半加法器,其中每个加法器从第一操作数接收位和来自第二操作数的相应位。 每个半加法器响应于输入产生和位和进位位。 优选地,零标志生成电路还包括N-1个异或(EXOR)门。 每个N-1个EXOR门接收N个和位的一个位和N个进位位的相应位作为输入。 来自EXOR门的N-1个输出以及反相最低有效和位被路由到逻辑电路。 逻辑电路用作与门,产生指示每个输入信号是否等于1的输出信号。

    Data processor having a cache with efficient storage of predecode information, cache, and method
    10.
    发明授权
    Data processor having a cache with efficient storage of predecode information, cache, and method 有权
    数据处理器具有具有高效存储预解码信息,高速缓存和方法的高速缓存

    公开(公告)号:US07827355B1

    公开(公告)日:2010-11-02

    申请号:US10887069

    申请日:2004-07-08

    摘要: A data processor (200) includes an instruction cache (220) and a secondary cache (250). The instruction cache (220) has a plurality of cache lines. Each of the plurality of cache lines stores a first plurality of bits (222) corresponding to at least one instruction and a second plurality of bits (224, 226) associated with the execution of the at least one instruction. The secondary cache (250) is coupled to the instruction cache (220) and stores cache lines from the instruction cache (250) by storing the first plurality of bits (222) and a third plurality of bits (255, 257) corresponding to the second plurality of bits (224, 226). The third plurality of bits (255, 257) is fewer in number than the second plurality of bits (224, 226).

    摘要翻译: 数据处理器(200)包括指令高速缓存(220)和次级高速缓存(250)。 指令高速缓存(220)具有多条高速缓存行。 多个高速缓存行中的每一个存储对应于与至少一个指令的执行相关联的至少一个指令和第二多个位(224,226)的第一多个位(222)。 辅助高速缓存(250)被耦合到指令高速缓存(220),并且通过存储对应于第一多个位(222)和第三多个位(255,257)来存储来自指令高速缓存(250)的高速缓存行 第二多个位(224,226)。 第三多个比特(255,257)的数量少于第二多个比特(224,226)。