Apparatus and method for reformatting instructions before reaching a dispatch point in a superscalar processor
    42.
    发明申请
    Apparatus and method for reformatting instructions before reaching a dispatch point in a superscalar processor 审中-公开
    在超标量处理器到达调度点之前重新格式化指令的装置和方法

    公开(公告)号:US20060155961A1

    公开(公告)日:2006-07-13

    申请号:US11030339

    申请日:2005-01-06

    IPC分类号: G06F9/30

    CPC分类号: G06F9/3802 G06F9/382

    摘要: Method and apparatus for reformatting instructions in a pipelined processor. An instruction register holds a plurality of instructions received from a cache memory external to the processor. A predecoder predecodes each of the instructions and determines from an instruction operation field where the instruction fields should be placed. A multiplexer reformats architecturally aligned instructions into hardware implementation aligned instructions prior to storing into L1 cache, so that the instructions are ready for dispatch to the pipeline execution units.

    摘要翻译: 用于在流水线处理器中重新格式化指令的方法和装置。 指令寄存器保存从处理器外部的高速缓冲存储器接收的多个指令。 预解码器对每个指令进行预解码,并从指令操作字段中确定应放置指令字段。 多路复用器将结构化对齐的指令重新格式化为在存储到L1高速缓存之前的硬件实现对准的指令,使得指令准备好发送到流水线执行单元。

    Method and apparatus for avoiding data dependency hazards in a microprocessor pipeline architecture
    43.
    发明申请
    Method and apparatus for avoiding data dependency hazards in a microprocessor pipeline architecture 失效
    在微处理器流水线架构中避免数据依赖危害的方法和装置

    公开(公告)号:US20060037023A1

    公开(公告)日:2006-02-16

    申请号:US10916188

    申请日:2004-08-11

    IPC分类号: G06F9/46

    摘要: A method and system for avoiding various hazards for instructions which are propagating through a microprocessor pipeline. When a plurality of instructions exist within the pipeline which read and write the same value, a vector is established to distinguish the older from the newer instructions. Further, before instructions are dispatched for execution, pointers are generated which identify the particular instruction which had the operand or parameter value needed. Accordingly, by monitoring both the recent vector and pointers, dated dependency hazards can be avoided.

    摘要翻译: 一种用于避免通过微处理器管道传播的指令的各种危险的方法和系统。 当流水线内存在多个读取和写入相同值的指令时,建立一个向量来区分较旧的指令。 此外,在分派指令执行之前,生成指针,该指针标识具有所需操作数或参数值的特定指令。 因此,通过监视最近的向量和指针,可以避免日期依赖危害。

    Representing loop branches in a branch history register with multiple bits
    44.
    发明申请
    Representing loop branches in a branch history register with multiple bits 有权
    在多个位的分支历史寄存器中表示循环分支

    公开(公告)号:US20070220239A1

    公开(公告)日:2007-09-20

    申请号:US11378712

    申请日:2006-03-17

    IPC分类号: G06F15/00

    CPC分类号: G06F9/3848

    摘要: In response to a property of a conditional branch instruction associated with a loop, such as a property indicating that the branch is a loop-ending branch, a count of the number of iterations of the loop is maintained, and a multi-bit value indicative of the loop iteration count is stored in a Branch History Register (BHR). In one embodiment, the multi-bit value may comprise the actual loop count, in which case the number of bits is variable. In another embodiment, the number of bits is fixed (e.g., two) and loop iteration counts are mapped to one of a fixed number of multi-bit values (e.g., four) by comparison to thresholds. Separate iteration counts may be maintained for nested loops, and a multi-bit value stored in the BHR may indicate a loop iteration count of only an inner loop, only the outer loop, or both.

    摘要翻译: 响应于与循环相关联的条件转移指令的属性,例如指示分支是循环结束分支的属性,维持循环的迭代次数的计数,并且指示多位值 循环迭代计数存储在分支历史记录寄存器(BHR)中。 在一个实施例中,多比特值可以包括实际循环计数,在这种情况下,比特数是可变的。 在另一个实施例中,比特数是固定的(例如,两个),并且与阈值相比较,循环迭代计数被映射到固定数量的多比特值(例如,四)中的一个。 对于嵌套循环可以保持单独的迭代计数,并且存储在BHR中的多位值可能仅表示内部循环,仅外部循环或两者的循环迭代计数。

    Retry cancellation mechanism to enhance system performance
    46.
    发明申请
    Retry cancellation mechanism to enhance system performance 审中-公开
    重试取消机制,提升系统性能

    公开(公告)号:US20060253662A1

    公开(公告)日:2006-11-09

    申请号:US11121121

    申请日:2005-05-03

    IPC分类号: G06F13/00 G06F12/00

    CPC分类号: G06F12/0831 G06F12/0813

    摘要: A method, an apparatus, and a computer program are provided for a retry cancellation mechanism to enhance system performance when a cache is missed or during direct memory access in a multi-processor system. In a multi-processor system with a number of independent nodes, the nodes must be able to request data that resides in memory locations on other nodes. The nodes search their memory caches for the requested data and provide a reply. The dedicated node arbitrates these replies and informs the nodes how to proceed. This invention enhances system performance by enabling the transfer of the requested data if an intervention reply is received by the dedicated node, while ignoring any retry replies. An intervention reply signifies that the modified data is within the node's memory cache and therefore, any retries by other nodes can be ignored.

    摘要翻译: 提供了一种用于重试取消机制的方法,装置和计算机程序,以便在多处理器系统中,在高速缓存错过时或在直接存储器访问期间增强系统性能。 在具有多个独立节点的多处理器系统中,节点必须能够请求位于其他节点上的存储器位置的数据。 节点搜索其内存缓存以获取所请求的数据,并提供答复。 专用节点仲裁这些应答,并通知节点如何继续。 本发明通过在忽略任何重试应答的同时,如果专用节点接收到干预应答,则能够传送所请求的数据来增强系统性能。 干预回复表示修改后的数据位于节点的内存缓存内,因此可以忽略其他节点的任何重试。

    Single request data transfer regardless of size and alignment

    公开(公告)号:US20060031705A1

    公开(公告)日:2006-02-09

    申请号:US11246427

    申请日:2005-10-07

    IPC分类号: G06F5/06

    摘要: A method, computer system and set of signals are disclosed allowing for communication of a data transfer, via a bus, between a master and a slave using a single transfer request regardless of transfer size and alignment. The invention provides three transfer qualifier signals including: a first signal including a starting byte address of the data transfer; a second signal including a size of the data transfer in data beats; and a third signal including a byte enable for each byte required during a last data beat of the data transfer. The invention is usable with single or multiple beat, aligned or unaligned data transfers. Usage of the three transfer qualifier signals provides the slave with how many data beats it will transfer at the start of the transfer, and the alignment of both the starting and ending data beats. As a result, the slave need not calculate the number of bytes it will transfer. In terms of multiple beat transfers, the number of data transfer requests are reduced, which reduces the amount of switching, bus arbitration and power consumption required. In addition, the invention allows byte enable signals to be used for subsequent data transfer requests prior to the completion of the initial data transfer, which reduces power consumption and allows for pipelining of data transfer requests.

    Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor
    48.
    发明申请
    Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor 失效
    用于减少指令高速缓存和流水线处理器之间的等待时间的装置和方法

    公开(公告)号:US20050216703A1

    公开(公告)日:2005-09-29

    申请号:US10810235

    申请日:2004-03-26

    IPC分类号: G06F9/38 G06F9/40

    摘要: A method and apparatus for executing instructions in a pipeline processor. The method decreases the latency between an instruction cache and a pipeline processor when bubbles occur in the processing stream due to an execution of a branch correction, or when an interrupt changes the sequence of an instruction stream. The latency is reduced when a decode stage for detecting branch prediction and a related instruction queue location have invalid data representing a bubble in the processing stream. Instructions for execution are inserted in parallel into the decode stage and instruction queue, thereby reducing by one cycle time the length of the pipeline stage.

    摘要翻译: 一种用于在流水线处理器中执行指令的方法和装置。 由于执行分支校正,或当中断改变指令流的序列时,该方法减少了在处理流中发生气泡时指令高速缓存和流水线处理器之间的等待时间。 当用于检测分支预测的解码级和相关指令队列位置具有表示处理流中的气泡的无效数据时,等待时间减少。 执行指令并行插入到解码级和指令队列中,从而将流水线级的长度减少一个周期。

    Systems and methods for selectively inclusive cache
    49.
    发明申请
    Systems and methods for selectively inclusive cache 审中-公开
    选择性包容性缓存的系统和方法

    公开(公告)号:US20070038814A1

    公开(公告)日:2007-02-15

    申请号:US11201221

    申请日:2005-08-10

    IPC分类号: G06F13/28

    CPC分类号: G06F12/0831 G06F12/0897

    摘要: Embodiments include systems and methods for selectively inclusive multi-level cache. When data for which memory coherency is designated is received from a process and stored into a lower level cache the data is copied into a higher level of cache. When the data is snooped it is snooped from the higher level cache and not the lower level of cache. When data is invalidated in the higher level cache, the data is invalidated in the lower level cache also. Lines of higher level cache are inclusive of lower level cache lines for data for which memory coherency is designated, but need not be inclusive of data for which coherency is not designated.

    摘要翻译: 实施例包括用于选择性地包含多级缓存的系统和方法。 当从处理中接收到指定了存储器一致性的数据并存储到较低级高速缓存中时,数据被复制到更高级别的高速缓存中。 当窥探数据时,它将从较高级别的缓存中窥探​​,而不是缓存的较低级别。 当数据在较高级缓存中无效时,数据也在低级缓存中失效。 高级缓存的行包括用于为其指定了内存一致性的数据的较低级高速缓存行,但不一定包含未指定相关性的数据。

    Performance profiling of microprocessor systems using debug hardware and performance monitor
    50.
    发明申请
    Performance profiling of microprocessor systems using debug hardware and performance monitor 审中-公开
    使用调试硬件和性能监视器对微处理器系统进行性能分析

    公开(公告)号:US20060048011A1

    公开(公告)日:2006-03-02

    申请号:US10926566

    申请日:2004-08-26

    IPC分类号: G06F11/00

    摘要: A method and system for monitoring the real-time of software running on a microprocessor system. Debug hardware is used to select a range of instructions or events to be monitored by a performance monitor interval with the microprocessor system. A comparison is made between each event and start and stop events are identified in the debug hardware. The performance monitor is enabled by the debug hardware, when events occur within the range defined by the debug hardware. Use of the debug hardware for enabling performance monitoring avoids any overhead associated with generating interrupts, or additional code in the application program.

    摘要翻译: 一种用于监视在微处理器系统上运行的软件的实时的方法和系统。 调试硬件用于通过微处理器系统的性能监视间隔来选择要监视的一系列指令或事件。 在每个事件之间进行比较,并在调试硬件中标识起始和停止事件。 性能监视器由调试硬件启用,当事件发生在调试硬件定义的范围内时。 使用调试硬件实现性能监视可以避免与生成中断或应用程序中的附加代码相关的任何开销。