Method to reduce memory latencies by performing two levels of speculation
    1.
    发明授权
    Method to reduce memory latencies by performing two levels of speculation 有权
    通过执行两级投机来减少内存延迟的方法

    公开(公告)号:US06496917B1

    公开(公告)日:2002-12-17

    申请号:US09499264

    申请日:2000-02-07

    IPC分类号: G06F1200

    摘要: A multiprocessor system includes a plurality of central processing units (CPUs) connected to one another by a system bus. Each CPU includes a cache controller to communicate with its cache, and a primary memory controller to communicate with its primary memory. When there is a cache miss in a CPU, the cache controller routes an address request for primary memory directly to the primary memory via the CPU as a speculative request without access the system bus, and also issues the address request to the system bus to facilitate data coherency. The speculative request is queued in the primary memory controller, which in turn retrieves speculative data from a specified primary memory address. The CPU monitors the system bus for a subsequent transaction that requests the specified data in the primary memory. If the subsequent transaction requesting the specified data is a read transaction that corresponds to the speculative address request, the speculative request is validated and becomes non-speculative. If, on the other hand, the subsequent transaction requesting the specified data is a write transaction, the speculative request is canceled.

    摘要翻译: 多处理器系统包括通过系统总线相互连接的多个中央处理单元(CPU)。 每个CPU包括与其高速缓存通信的高速缓存控制器以及与其主存储器通信的主存储器控制器。 当CPU中存在高速缓存未命中时,缓存控制器将主存储器的地址请求直接通过CPU作为推测请求直接发送到主存储器,而无需访问系统总线,并且还向系统总线发出地址请求以方便 数据一致性。 推测请求在主存储器控制器中排队,主存储器控制器又从指定的主存储器地址检索推测数据。 CPU监视系统总线以用于请求主存储器中指定数据的后续事务。 如果请求指定数据的后续事务是与推测地址请求相对应的读事务,则推测请求将被验证并变为非推测性。 另一方面,如果请求指定数据的后续事务是写事务,则推测请求被取消。

    Simplified writeback handling
    2.
    发明授权
    Simplified writeback handling 有权
    简化回写处理

    公开(公告)号:US06477622B1

    公开(公告)日:2002-11-05

    申请号:US09670856

    申请日:2000-09-26

    IPC分类号: G06F1200

    CPC分类号: G06F12/0804

    摘要: The main cache of a processor in a multiprocessor computing system is coupled to receive writeback data during writeback operations. In one embodiment, during writeback operations, e.g., for a cache miss, dirty data in the main cache is merged with modified data from an associated write cache, and the resultant writeback data line is loaded into a writeback buffer. The writeback data is also written back into the main cache, and is maintained in the main cache until replaced by new data. Subsequent requests (i.e., snoops) for the data are then serviced from the main cache, rather than from the writeback buffer. In some embodiments, further modifications of the writeback data in the main cache are prevented. The writeback data line in the main cache remains valid until read data for the cache miss is returned, thereby ensuring that the read address reaches the system interface for proper bus ordering before the writeback line is lost. In one embodiment, the writeback operation is paired with the read operation for the cache miss to ensure that upon completion of the read operation, the writeback address has reached the system interface for bus ordering, thereby maintaining cache coherency while allowing requests to be serviced from the main cache.

    摘要翻译: 多处理器计算系统中的处理器的主缓存被耦合以在回写操作期间接收回读数据。 在一个实施例中,在回写操作期间,例如,对于高速缓存未命中,主缓存器中的脏数据与来自相关联的写入高速缓存的修改的数据合并,并且所得到的写回数据行被加载到写回缓冲器中。 写回数据也被写回到主缓存中,并保存在主缓存中,直到被新数据替换为止。 然后,从主缓存器而不是从回写缓冲器来服务数据的后续请求(即,窥探)。 在一些实施例中,防止主缓存中的回写数据的进一步修改。 主缓存中的回写数据线在返回高速缓存未命中的读取数据之前保持有效,从而确保读地址到达系统接口以在回写行丢失之前进行正确的总线排序。 在一个实施例中,写回操作与用于高速缓存未命中的读取操作配对,以确保在完成读取操作时,回写地址已经到达用于总线排序的系统接口,从而保持高速缓存一致性,同时允许从 主缓存。

    Yield improvement through probe-based cache size reduction
    3.
    发明授权
    Yield improvement through probe-based cache size reduction 有权
    通过基于探测的高速缓存大小减少提高产量

    公开(公告)号:US06918071B2

    公开(公告)日:2005-07-12

    申请号:US09839057

    申请日:2001-04-20

    IPC分类号: G06F12/08 G11C29/00

    摘要: A multiple-way cache memory having a plurality of cache blocks and associated tag arrays includes a select circuit that stores way select values for each cache block. The way select values selectively disable one or more cache blocks from participating in cache operations by forcing tag comparisons associated with the disabled cache blocks to a mismatch condition so that the disabled cache blocks will not be selected to provide output data. The remaining enabled cache blocks may be operated as a less-associative cache memory without requiring cache addressing modifications.

    摘要翻译: 具有多个高速缓存块和相关联的标签阵列的多路高速缓存存储器包括存储每个高速缓存块的方式选择值的选择电路。 选择值的选择方式通过强制与禁用的高速缓存块相关联的标签比较到不匹配条件来选择性地禁用一个或多个高速缓存块来参与高速缓存操作,使得禁用的高速缓存块将不被选择来提供输出数据。 剩余的启用的高速缓存块可以作为较少关联的高速缓冲存储器来操作,而不需要高速缓存寻址修改。

    DMA transfer method for a system including a single-chip processor with a processing core and a device interface in different clock domains
    4.
    发明授权
    DMA transfer method for a system including a single-chip processor with a processing core and a device interface in different clock domains 有权
    包括具有处理核心的单芯片处理器和不同时钟域中的器件接口的系统的DMA传输方法

    公开(公告)号:US06553435B1

    公开(公告)日:2003-04-22

    申请号:US09229013

    申请日:1999-01-12

    IPC分类号: G06F1328

    摘要: A single-chip central processing unit (CPU) includes a processing core and a complete cache-coherent I/O system that operates asynchronously with the processing core. An internal communications protocol uses synchronizers and data buffers to transfer information between a clock domain of the processing core and a clock domain of the I/O system. The synchronizers transfer control and handshake signal between clock domains, but the data buffer transfers data without input or output synchronization circuitry for data bits. Throughput for the system is high because the processing unit has direct access to I/O system so that no delays are incurred for complex mechanisms which are commonly employed between a CPU and an external I/O chip-set. Throughput is further increased by holding data from one DMA transfer in the data buffer for use in a subsequent DMA transfer. In one embodiment, the integrated I/O system contains a dedicated memory management unit including a translation lookaside buffer which converts I/O addresses to physical addresses for the processing core.

    摘要翻译: 单芯片中央处理单元(CPU)包括处理核心和与处理核心异步运行的完整高速缓存相干I / O系统。 内部通信协议使用同步器和数据缓冲器在处理核心的时钟域和I / O系统的时钟域之间传输信息。 同步器在时钟域之间传送控制和握手信号,但是数据缓冲器传输数据,而不需要输入或输出同步电路来进行数据位。 系统的吞吐量很高,因为处理单元可以直接访问I / O系统,以便在CPU和外部I / O芯片组之间通常采用的复杂机制不会产生延迟。 通过保存来自数据缓冲器中的一个DMA传输的数据以进行后续DMA传输来进一步增加吞吐量。 在一个实施例中,集成I / O系统包含专用存储器管理单元,其包括将I / O地址转换为处理核的物理地址的翻译后备缓冲器。

    Low-latency, high-throughput, integrated cache coherent I/O system for a
single-chip processor
    5.
    发明授权
    Low-latency, high-throughput, integrated cache coherent I/O system for a single-chip processor 失效
    用于单芯片处理器的低延迟,高吞吐量的集成缓存一致I / O系统

    公开(公告)号:US5884100A

    公开(公告)日:1999-03-16

    申请号:US660026

    申请日:1996-06-06

    摘要: A single-chip central processing unit (CPU) includes a processing core and a complete cache-coherent I/O system that operates asynchronously with the processing core. An internal communications protocol uses synchronizers and data buffers to transfer information between a clock domain of the processing core and a clock domain of the I/O system. The synchronizers transfer control and handshake signal between clock domains, but the data buffer transfers data without input or output synchronization circuitry for data bits. Throughput for the system is high because the processing unit has direct access to I/O system so that no delays are incurred for complex mechanisms which are commonly employed between a CPU and an external I/O chip-set. Throughput is further increased by holding data from one DMA transfer in the data buffer for use in a subsequent DMA transfer. In one embodiment, the integrated I/O system contains a dedicated memory management unit including a translation lookaside buffer which converts I/O addresses to physical addresses for the processing core.

    摘要翻译: 单芯片中央处理单元(CPU)包括处理核心和与处理核心异步运行的完整高速缓存相干I / O系统。 内部通信协议使用同步器和数据缓冲器在处理核心的时钟域和I / O系统的时钟域之间传输信息。 同步器在时钟域之间传送控制和握手信号,但是数据缓冲器传输数据,而不需要输入或输出同步电路来进行数据位。 系统的吞吐量很高,因为处理单元可以直接访问I / O系统,以便在CPU和外部I / O芯片组之间通常采用的复杂机制不会产生延迟。 通过保存来自数据缓冲器中的一个DMA传输的数据以进行后续DMA传输来进一步增加吞吐量。 在一个实施例中,集成I / O系统包含专用存储器管理单元,其包括将I / O地址转换为处理核的物理地址的翻译后备缓冲器。

    Inclusion vector architecture for a level two cache
    7.
    发明授权
    Inclusion vector architecture for a level two cache 失效
    包含二级缓存的向量架构

    公开(公告)号:US5996048A

    公开(公告)日:1999-11-30

    申请号:US879530

    申请日:1997-06-20

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0811

    摘要: A cache architecture with a first level cache and a second level cache, with the second level cache lines including an inclusion vector which indicates which portion of that line are stored in the first level cache. In addition, an instruction/data bit in the inclusion vector indicates whether a portion of that line is in the instruction cache at all. Thus, when a snoop is done to the level two cache, additional snoops to the level one cache only need to be done for those lines which are indicated as present by the inclusion vector.

    摘要翻译: 具有第一级高速缓存和第二级高速缓存的高速缓存结构,其中第二级高速缓存线包括指示该行的哪一部分被存储在第一级高速缓存中的包含向量。 另外,包含向量中的指令/数据位表示该行的一部分是否在指令高速缓存中。 因此,当对二级缓存进行窥探时,仅对包含向量表示的那些行需要对一级缓存的附加监听。

    Apparatus and method to speculatively initiate primary memory accesses
    8.
    发明授权
    Apparatus and method to speculatively initiate primary memory accesses 失效
    推测性地启动主存储器访问的装置和方法

    公开(公告)号:US5761708A

    公开(公告)日:1998-06-02

    申请号:US658874

    申请日:1996-05-31

    IPC分类号: G06F12/08 G06F13/16 G06F13/18

    CPC分类号: G06F13/161 G06F12/0884

    摘要: A central processing unit with an external cache controller and a primary memory controller is used to speculatively initiate primary memory access in order to improve average primary memory access times. The external cache controller processes an address request during an external cache latency period and selectively generates an external cache miss signal or an external cache hit signal. If no other primary memory access demands exist at the beginning of the external cache latency period, the primary memory controller is used to speculatively initiate a primary memory access corresponding to the address request. The speculative primary memory access is completed in response to an external cache miss signal. The speculative primary memory access is aborted if an external cache hit signal is generated or a non-speculative primary memory access demand is generated during the external cache latency period.

    摘要翻译: 具有外部高速缓存控制器和主存储器控制器的中央处理单元用于推测性地启动主存储器访问,以便提高平均主存储器访问时间。 外部高速缓存控制器在外部高速缓存等待期间处理地址请求,并选择性地产生外部高速缓存未命中信号或外部高速缓存命中信号。 如果在外部高速缓存等待时间开始时不存在其他主存储器访问需求,则主存储器控制器用于推测地发起对应于地址请求的主存储器访问。 响应于外部高速缓存未命中信号完成了推测性主存储器访问。 如果外部缓存命中信号被产生或在外部高速缓存等待时间段期间产生非推测性的主存储器访问需求,则推测主存储器访问被中止。

    Dynamically allocated cache memory for a multi-processor unit
    9.
    发明授权
    Dynamically allocated cache memory for a multi-processor unit 有权
    为多处理器单元动态分配高速缓存

    公开(公告)号:US06725336B2

    公开(公告)日:2004-04-20

    申请号:US09838921

    申请日:2001-04-20

    IPC分类号: G06F1208

    CPC分类号: G06F12/084

    摘要: The resources of a partitioned cache memory are dynamically allocated between two or more processors on a multi-processor unit (MPU). In one embodiment, the MPU includes first and second processors, and the cache memory includes first and second partitions. A cache access circuit selectively transfers data between the cache memory partitions to maximize cache resources. In one mode, both processors are active and may simultaneously execute separate instruction threads. In this mode, the cache access circuit allocates the first cache memory partition as dedicated cache memory for the first processor, and allocates the second cache memory partition as dedicated cache memory for the second processor. In another mode, one processor is active, and the other processor is inactive. In this mode, the cache access circuit allocates both the first and second cache memory partitions as cache memory for the active processor.

    摘要翻译: 分区高速缓冲存储器的资源在多处理器单元(MPU)上的两个或多个处理器之间动态分配。 在一个实施例中,MPU包括第一和第二处理器,并且高速缓存存储器包括第一和第二分区。 缓存访问电路选择性地在高速缓冲存储器分区之间传送数据以最大化高速缓存资源。 在一种模式下,两个处理器都是活动的,并且可以同时执行单独的指令线程。 在该模式中,高速缓存访​​问电路将第一高速缓存存储器分区分配为用于第一处理器的专用高速缓存存储器,并且将第二高速缓冲存储器分区分配为用于第二处理器的专用高速缓冲存储器。 在另一种模式下,一个处理器处于活动状态,另一个处理器处于非活动状态。 在该模式中,高速缓存访​​问电路将第一高速缓存存储器分区和第二高速缓存存储器分区分配为用于活动处理器的高速缓冲存储器。

    Method and apparatus for resolving multiple branches
    10.
    发明授权
    Method and apparatus for resolving multiple branches 失效
    用于解决多个分支的方法和装置

    公开(公告)号:US06256729B1

    公开(公告)日:2001-07-03

    申请号:US09004971

    申请日:1998-01-09

    IPC分类号: G06F1500

    CPC分类号: G06F9/3861 G06F9/3806

    摘要: A method for repairing a pipeline in response to a branch instruction having a branch, includes the steps of providing a branch repair table having a plurality of entries, allocating an entry in the branch repair table for the branch instruction, storing a target address, a fall-through address, and repair information in the entry in the branch repair table, processing the branch instruction to determine whether the branch was taken, and repairing the pipeline in response to the repair information and the fall-through address in the entry in the branch repair table when the branch was not taken.

    摘要翻译: 一种用于响应于具有分支的分支指令来修复流水线的方法,包括以下步骤:提供具有多个条目的分支修复表,在分支指令的分支修复表中分配条目,存储目标地址, 分支修复表中的条目中的修复信息和修复信息,处理分支指令以确定是否采用分支,以及修复管道,以响应修复信息和条目中的到达地址 分支修复表时未分支。