Multi-level multiprocessor speculation mechanism
    61.
    发明授权
    Multi-level multiprocessor speculation mechanism 有权
    多级多处理器推测机制

    公开(公告)号:US06748518B1

    公开(公告)日:2004-06-08

    申请号:US09588483

    申请日:2000-06-06

    IPC分类号: G06F930

    摘要: Disclosed is a processor, which reduces issuing of unnecessary barrier operations during instruction processing. The processor comprises an instruction sequencing unit and a load store unit (LSU) that issues a group of memory access requests that precede a barrier instruction in an instruction sequence. The processor also includes a controller, which in response to a determination that all of the memory access requests hit in a cache affiliated with the processor, withholds issuing on an interconnect a barrier operation associated with the barrier instruction. The controller further directs the load store unit to ignore the barrier instruction and complete processing of a next group of memory access requests following the barrier instruction in the instruction sequence without receiving an acknowledgment.

    摘要翻译: 公开了一种处理器,其减少在指令处理期间发出不必要的屏障操作。 处理器包括指令排序单元和负载存储单元(LSU),其发出在指令序列中的屏障指令之前的一组存储器访问请求。 处理器还包括控制器,其响应于确定在处理器附属的高速缓存中的所有存储器访问请求,在互连上保留与屏障指令相关联的屏障操作。 控制器进一步引导加载存储单元忽略屏障指令,并且在指令序列中的屏障指令之后的下一组存储器访问请求完成处理而不接收到确认。

    High performance data processing system via cache victimization protocols
    62.
    发明授权
    High performance data processing system via cache victimization protocols 失效
    高性能数据处理系统通过缓存受害协议

    公开(公告)号:US06721853B2

    公开(公告)日:2004-04-13

    申请号:US09895232

    申请日:2001-06-29

    IPC分类号: G06F1208

    CPC分类号: G06F12/0813

    摘要: A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.

    摘要翻译: 用于多路多处理器链路中的系统总线的远程节点中的处理器的高速缓存控制器在刷新该高速缓存行并且来自主节点中的存储器的信息为否的时候发送用于给定高速缓存行的缓存解除分配地址事务(CDAT) 较长时间被认为对该远程节点处理器的该缓存行有效。 然后,该同一远程节点中的其他处理器将执行该CDAT事务的本地侦听作为后台功能。 如果窥探结果表明相同的信息在另一个缓存中有效,并且该缓存决定更好地将其保留在该远程节点中,则该信息将保留在该位置。 如果窥探结果表明信息在该远程节点的高速缓存中无效,或由于CDAT而被刷新,则通知多处理器链路的家庭节点中的系统内存目录并响应于此改变状态。 该系统具有更高的性能,因为高速缓存行维护功能在后台执行,而不是基于主流需求。

    High speed lock acquisition mechanism with time parameterized cache coherency states
    63.
    发明授权
    High speed lock acquisition mechanism with time parameterized cache coherency states 有权
    具有时间参数化高速缓存一致性状态的高速锁定采集机制

    公开(公告)号:US06629212B1

    公开(公告)日:2003-09-30

    申请号:US09437187

    申请日:1999-11-09

    IPC分类号: G06F1200

    CPC分类号: G06F12/0815

    摘要: A multiprocessor data processing system requires careful management to maintain cache coherency. In conventional systems using a MESI approach, two or more processors will often compete for ownership of a common cache line. As a result, ownership of the cache line will frequently “bounce” between multiple processors, which causes a significant reduction in cache efficiency. The preferred embodiment provides a modified MESI state which holds the status of the cache line static for a fixed period of time, which eliminates the bounce effect from contention between multiple processors.

    摘要翻译: 多处理器数据处理系统需要仔细管理以保持高速缓存一致性。 在使用MESI方法的常规系统中,两个或多个处理器通常将竞争公用高速缓存行的所有权。 因此,高速缓存行的所有权将在多个处理器之间频繁地“反弹”,这导致高速缓存效率的显着降低。 优选实施例提供修改的MESI状态,其将高速缓存行的状态保持固定的固定时间段,从而消除了来自多个处理器之间的争用的反弹效应。

    Multiprocessor speculation mechanism with imprecise recycling of storage operations
    64.
    发明授权
    Multiprocessor speculation mechanism with imprecise recycling of storage operations 有权
    多处理器推测机制,存储操作不正确的回收

    公开(公告)号:US06606702B1

    公开(公告)日:2003-08-12

    申请号:US09588606

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a method of operating a processor, by which a speculatively issued load request, which fetches incorrect data, is recycled. An instruction sequence, which includes a barrier instruction and a load instruction that follows the barrier instruction in program order, is received for execution. In response to the barrier instruction, a barrier operation is issued on an interconnect. Following, in response to the load instruction and while the barrier operation is pending, a load request is issued to memory. When a pre-determined type of invalidate, which is affiliated with the load request, is received before the receipt of an acknowledgment for the barrier operation, data that is returned by memory in response to the load request is discarded and the load request is re-issued. The pre-determined type of invalidate includes, for example, a snoop invalidate.

    摘要翻译: 公开了一种操作处理器的方法,通过该方法,回收了推测性发出的载入请求,其提取不正确的数据。 接收指令序列,其中包括按程序顺序跟随障碍指令的障碍指令和加载指令,以执行。 响应于屏障指令,在互连上发出屏障操作。 接下来,响应于加载指令,并且当屏障操作正在等待时,向存储器发出加载请求。 当在接收到屏障操作的确认之前接收到与加载请求相关联的预定类型的无效时,丢弃由存储器响应于加载请求而返回的数据,并且重新加载请求 -发行。 预定类型的无效包括例如窥探无效。

    Extended cache coherency protocol with a “lock released” state
    65.
    发明授权
    Extended cache coherency protocol with a “lock released” state 失效
    具有“锁定释放”状态的扩展缓存一致性协议

    公开(公告)号:US06549989B1

    公开(公告)日:2003-04-15

    申请号:US09437184

    申请日:1999-11-09

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. In particular, as multiple processors compete for the same cache line, a significant amount of processor time is lost determining if another processor's cache line lock has been released and attempting to reserve that cache line while it is still owned by the other processor. The preferred embodiment provides an additional cache state which specifically indicates that a processor has released its lock on a cache line after it has performed any necessary modifications.

    摘要翻译: 多处理器数据处理系统需要仔细管理以保持高速缓存一致性。 使用MESI方法的常规系统通过低效的锁定采集和锁定保留技术来牺牲一些性能。 所公开的系统提供附加的高速缓存状态,指示符位和锁定采集例程以提高高速缓存性能。 特别地,由于多个处理器竞争相同的高速缓存行,所以丢失了大量的处理器时间,这确定了另一个处理器的高速缓存行锁定是否已被释放,并尝试在该另一个处理器仍然拥有的情况下保留该高速缓存行。 优选实施例提供了附加高速缓存状态,其特别地指示处理器在执行任何必要的修改之后已经在高速缓存线上释放其锁定。

    Mechanism for high performance transfer of speculative request data between levels of cache hierarchy
    66.
    发明授权
    Mechanism for high performance transfer of speculative request data between levels of cache hierarchy 失效
    在高速缓存层级之间高速传输推测请求数据的机制

    公开(公告)号:US06532521B1

    公开(公告)日:2003-03-11

    申请号:US09345715

    申请日:1999-06-30

    IPC分类号: G06F1200

    摘要: A method of operating a processing unit of a computer system, by issuing an instruction having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value. The prefetch limit of cache usage may be established with a maximum number of sets in a congruence class usable by the requesting processing unit. A flag in a directory of the cache may be set to indicate that the prefetch value was retrieved as the result of a prefetch operation. In the implementation wherein the cache is a multi-level cache, a second flag in the cache directory may be set to indicate that prefetch value has been sourced to an upstream cache. A cache line containing prefetch data can be automatically invalidated after a preset amount of time has passed since the prefetch value was requested.

    摘要翻译: 一种操作计算机系统的处理单元的方法,通过从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。 高速缓存使用的预取限制可以由请求处理单元可用的同余类中的最大数量的集合来建立。 高速缓存目录中的标志可以被设置为指示作为预取操作的结果检索预取值。 在其中高速缓存是多级高速缓存的实现中,高速缓存目录中的第二标志可以被设置为指示预取值已经被提供给上游高速缓存。 包含预取数据的缓存行可以在从请求预取值开始经过预设的时间后自动失效。

    Layered local cache with lower level cache updating upper and lower level cache directories
    67.
    发明授权
    Layered local cache with lower level cache updating upper and lower level cache directories 失效
    具有较低级别缓存的分层本地缓存更新上下级缓存目录

    公开(公告)号:US06463507B1

    公开(公告)日:2002-10-08

    申请号:US09340082

    申请日:1999-06-25

    IPC分类号: G06F1200

    摘要: A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.

    摘要翻译: 一种改进计算机系统的存储器访问的方法,通过将请求发送到较低级别的存储子系统以及由请求处理器对与请求的信息的预期用途有关的关联信息而不使用高级别的负载队列来进行发送。 将所请求的信息与相关联的使用信息一起返回到处理器允许立即放置信息而不使用重新加载缓冲器。 使用与缓存负载总线分离(并具有较小粒度)的寄存器负载总线返回信息。 然后可能不精确地重新加载上级(L1)高速缓存(高级缓存也可以不精确地用存储指令重新加载)。 低级(L​​2)缓存可以监视L1和L2高速缓存活动,其可用于在L1高速缓存中选择受害者缓存块(基于附加的L2信息),或者选择L2缓存中的受害缓存块( 基于附加的L1信息)。 L1目录的L2控制也允许解决某些侦听请求,而无需等待L1确认。 本发明可以应用于例如指令,操作数数据和翻译高速缓存。

    Method and system for managing speculative requests in a multi-level memory hierarchy
    68.
    发明授权
    Method and system for managing speculative requests in a multi-level memory hierarchy 失效
    用于管理多层内存层次结构中的推测性请求的方法和系统

    公开(公告)号:US06418516B1

    公开(公告)日:2002-07-09

    申请号:US09364409

    申请日:1999-07-30

    IPC分类号: G06F1208

    摘要: A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions and treats instructions in a different manner when they are loaded speculatively. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. The load requests are sent to the lower level cache when the upper level cache does not contain the value required by the load. If a speculative request is for an instruction which is likewise not present in the lower level cache, that request is ignored, keeping both the lower level and upper level caches free of speculative values that are infrequently used. If the value is present in the lower level cache, it is loaded into the upper level cache. If a speculative request is for operand data, the value is loaded only into the lower level cache if it is not already present, keeping the upper level cache free of speculative operand data.

    摘要翻译: 一种操作计算机系统的多级存储器层级的方法和体现该方法的装置,其中指令从直接从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值,并且当它们被推测地加载时以不同的方式对待指令。 这些预取请求可以是需求负载请求,其中处理单元将需要操作数数据或指令或推测性负载请求,其中处理单元可能需要或可能不需要操作数数据或指令,但分支预测或流关联预测 他们可能需要。 当高级缓存不包含负载所需的值时,负载请求将发送到较低级别的缓存。 如果对低级缓存中同样不存在的指令进行推测性请求,则忽略该请求,同时保持较低级别和上级缓存都不会被不经常使用的推测值。 如果该值存在于较低级缓存中,则将其加载到上级缓存中。 如果对于操作数数据是推测性请求,则该值仅在尚未存在的情况下被加载到较低级别的高速缓存中,保持高级缓存没有推测操作数数据。

    Protocol for transferring modified-unsolicited state during data intervention
    69.
    发明授权
    Protocol for transferring modified-unsolicited state during data intervention 有权
    缓存一致性协议,提供来自中间缓存的标志,以指示修改的高速缓存行的释放

    公开(公告)号:US06349369B1

    公开(公告)日:2002-02-19

    申请号:US09437180

    申请日:1999-11-09

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A novel cache coherency protocol provides a modified-unsolicited (MU) cache state to indicate that a value held in a cache line has been modified (i.e., is not currently consistent with system memory), but was modified by another processing unit, not by the processing unit associated with the cache that currently contains the value in the MU state, and that the value is held exclusive of any other horizontally adjacent caches. Because the value is exclusively held, it may be modified in that cache without the necessity of issuing a bus transaction to other horizontal caches in the memory hierarchy. The MU state may be applied as a result of a snoop response to a read request. The read request can include a flag to indicate that the requesting cache is capable of utilizing the MU state. Alternatively, a flag may be provided with intervention data to indicate that the requesting cache should utilize the modified-unsolicited state.

    摘要翻译: 一种新颖的高速缓存一致性协议提供修改的非请求(MU)高速缓存状态,以指示保持在高速缓存行中的值已被修改(即,当前不符合系统存储器),但是被另一个处理单元修改,而不是由 与当前包含MU状态的值的高速缓存相关联的处理单元,并且该值被保持为任何其他水平相邻的高速缓存。 因为该值是唯一保留的,所以可以在该高速缓存中修改该值,而不需要向存储器层级中的其他水平高速缓存发出总线事务。 作为对读取请求的窥探响应的结果,可以应用MU状态。 读取请求可以包括用于指示请求的高速缓存能够利用MU状态的标志。 或者,可以向标记提供干预数据,以指示请求的高速缓存应该利用修改的未经请求的状态。

    Multiprocessor system bus transaction for transferring exclusive-deallocate cache state to lower lever cache
    70.
    发明授权
    Multiprocessor system bus transaction for transferring exclusive-deallocate cache state to lower lever cache 失效
    多处理器系统总线事务,用于将独占解除缓存状态转移到低级缓存

    公开(公告)号:US06314498B1

    公开(公告)日:2001-11-06

    申请号:US09437197

    申请日:1999-11-09

    IPC分类号: G06F1208

    CPC分类号: G06F12/0831 G06F12/0811

    摘要: A cache coherency protocol uses a “Exclusive-Deallocate” (ED) coherency state to indicate that a particular value is currently held in an upper level cache in an exclusive, unmodified form (not shared with any other caches of the computer system, including caches associated with the same processing unit), so that the value can conveniently be modified without any lower level bus transactions since no lower level caches have allocated a line for the value. If the value is subsequently modified in the upper level cache, its coherency state is simply switched to “modified” without the need for any bus transactions. Conversely, if the value is evicted from the upper level cache without ever having been modified, it can be loaded into the lower level cache with a coherency state indicating that the lower level cache contains the unmodified value exclusive of all other caches in other processing units of the computer system. If the value is initially loaded into the upper level cache from a cache of another processing unit, or from a lower level cache of the same processing unit, then the upper level cache may be selectively programmed to mark the cache line with the ED state.

    摘要翻译: 高速缓存一致性协议使用“独占解除分配”(ED)一致性状态来指示特定值当前以独占未修改的形式(不与计算机系统的任何其他高速缓存共享,包括高速缓存)保持在高级缓存中 与相同的处理单元关联),使得该值可以方便地被修改而没有任何较低级别的总线事务,因为没有较低级别的高速缓存已经为该值分配了一行。 如果该值随后在高级缓存中被修改,则其一致性状态被简单地切换到“修改”,而不需要任何总线事务。 相反,如果该值从上级缓存中被逐出而没有被修改,则可以将其加载到具有一致性状态的相关性状态中,该相关性状态指示低级缓存包含其他处理单元中所有其他高速缓存的排他性的未修改值 的计算机系统。 如果该值最初从另一处理单元的高速缓存或相同处理单元的较低级高速缓存加载到高级缓存中,则可以选择性地编程高级缓存以用ED状态标记高速缓存行。