High performance data processing system via cache victimization protocols
    21.
    发明授权
    High performance data processing system via cache victimization protocols 失效
    高性能数据处理系统通过缓存受害协议

    公开(公告)号:US06721853B2

    公开(公告)日:2004-04-13

    申请号:US09895232

    申请日:2001-06-29

    IPC分类号: G06F1208

    CPC分类号: G06F12/0813

    摘要: A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.

    摘要翻译: 用于多路多处理器链路中的系统总线的远程节点中的处理器的高速缓存控制器在刷新该高速缓存行并且来自主节点中的存储器的信息为否的时候发送用于给定高速缓存行的缓存解除分配地址事务(CDAT) 较长时间被认为对该远程节点处理器的该缓存行有效。 然后,该同一远程节点中的其他处理器将执行该CDAT事务的本地侦听作为后台功能。 如果窥探结果表明相同的信息在另一个缓存中有效,并且该缓存决定更好地将其保留在该远程节点中,则该信息将保留在该位置。 如果窥探结果表明信息在该远程节点的高速缓存中无效,或由于CDAT而被刷新,则通知多处理器链路的家庭节点中的系统内存目录并响应于此改变状态。 该系统具有更高的性能,因为高速缓存行维护功能在后台执行,而不是基于主流需求。

    Multi-node data processing system and method of queue management in which a queued operation is speculatively cancelled in response to a partial combined response
    22.
    发明授权
    Multi-node data processing system and method of queue management in which a queued operation is speculatively cancelled in response to a partial combined response 失效
    多节点数据处理系统和队列管理方法,其中响应于部分组合响应推测性地取消排队操作

    公开(公告)号:US06591307B1

    公开(公告)日:2003-07-08

    申请号:US09436897

    申请日:1999-11-09

    IPC分类号: G06F112

    摘要: A data processing system includes an interconnect, a plurality of nodes coupled to the interconnect that each include at least one agent, response logic within each node, and a queue. In response to snooping a transaction on the interconnect, each agent outputs a snoop response. In addition, the queue, which has an associated agent, allocates an entry to service the transaction. The response logic within each node accumulates a partial combined response of its node and any preceding node until a complete combined response for all of the plurality of nodes is obtained. However, prior to the associated agent receiving the complete combined response, the queue speculatively deallocates the entry if the partial combined response indicates that an agent other than the associated agent will service the transaction.

    摘要翻译: 数据处理系统包括互连,耦合到互连的多个节点,每个节点包括至少一个代理,每个节点内的响应逻辑和队列。 响应在互连上窥探事务,每个代理输出一个侦听响应。 此外,具有关联代理的队列分配一个条目来为事务提供服务。 每个节点内的响应逻辑累积其节点和任何先前节点的部分组合响应,直到获得所有多个节点的完整组合响应。 然而,在相关联的代理接收到完整的组合响应之前,如果部分组合响应指示除了相关联的代理之外的代理将服务于该事务,则队列推测性地释放该条目。

    Multiprocessor system snoop scheduling mechanism for limited bandwidth snoopers performing directory update

    公开(公告)号:US06546468B2

    公开(公告)日:2003-04-08

    申请号:US09749054

    申请日:2000-12-27

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A multiprocessor computer system in which snoop operations of the caches are synchronized to allow the issuance of a cache operation during a cycle which is selected based on the particular manner in which the caches have been synchronized. Each cache controller is aware of when these synchronized snoop tenures occur, and can target these cycles for certain types of requests that are sensitive to snooper retries, such as kill-type operations. The synchronization may set up a priority scheme for systems with multiple interconnect buses, or may synchronize the refresh cycles of the DRAM memory of the snooper's directory. In another aspect of the invention, windows are created during which a directory will not receive write operations (i.e., the directory is reserved for only read-type operations). The invention may be implemented in a cache hierarchy which provides memory arranged in banks, the banks being similarly synchronized. The invention is not limited to any particular type of instruction, and the synchronization functionality may be hardware or software programmable.

    Method and system for clearing dependent speculations from a request queue
    25.
    发明授权
    Method and system for clearing dependent speculations from a request queue 失效
    从请求队列中清除相关推测的方法和系统

    公开(公告)号:US06487637B1

    公开(公告)日:2002-11-26

    申请号:US09364408

    申请日:1999-07-30

    IPC分类号: G06F1300

    摘要: A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. Further branch predictions or stream associations that were made based on an earlier speculative choice are linked by using a tag pool which assigns a bit fields in the tag pool entries to the level of speculation depth. Each entry shares in common the bit field values associated with earlier branches or stream associations. When a branch or stream predicted entry is no longer needed, that entry can be cancelled and all entries that were to be loaded dependent on that entry can likewise be cancelled by walking through all entries sharing the bit fields corresponding to the speculation depth of the cancelled entry and tagging those entries as invalid.

    摘要翻译: 一种操作计算机系统的多级存储器层级的方法和体现该方法的装置,其中指令从直接从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 这些预取请求可以是需求负载请求,其中处理单元将需要操作数数据或指令或推测性负载请求,其中处理单元可能需要或可能不需要操作数数据或指令,但分支预测或流关联预测 他们可能需要。 通过使用将标签池条目中的位字段分配给投机深度的标签池来链接根据较早的推测选择进行的进一步分支预测或流关联。 每个条目共享与早期分支或流关联相关联的比特字段值。 当不再需要分支或流预测条目时,可以取消该条目,并且可以通过遍历与所取消的投机深度相对应的比特字段的所有条目,来取消根据该条目加载的所有条目 输入并标记这些条目为无效。

    Asymmetrical cache properties within a hashed storage subsystem
    26.
    发明授权
    Asymmetrical cache properties within a hashed storage subsystem 有权
    散列存储子系统内的不对称缓存属性

    公开(公告)号:US06449691B1

    公开(公告)日:2002-09-10

    申请号:US09364285

    申请日:1999-07-30

    IPC分类号: G06F1300

    摘要: A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.

    摘要翻译: 处理器包括至少一个执行单元,耦合到执行单元的指令排序单元和在同一级别的多个高速缓存。 存储由执行单元使用的数据的高速缓存具有不同的高速缓存硬件,并且每个高速缓存优选仅存储具有地址空间的多个子集中的相应地址内的相关联的地址的数据。 不同的高速缓存硬件可以包括例如不同的高速缓存大小,不同的相关性,不同的扇区和不同的包容性。

    Method and system for cancelling speculative cache prefetch requests
    27.
    发明授权
    Method and system for cancelling speculative cache prefetch requests 失效
    用于取消推测性高速缓存预取请求的方法和系统

    公开(公告)号:US06438656B1

    公开(公告)日:2002-08-20

    申请号:US09364574

    申请日:1999-07-30

    IPC分类号: G06F1200

    摘要: A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. After a predetermined number of cycles has elapsed, the speculative load request is cancelled if the request has not already been completed.

    摘要翻译: 一种操作计算机系统的多级存储器层级的方法和体现该方法的装置,其中指令从直接从指令序列单元向处理单元的预取单元发出具有显式预取请求的指令。 本发明适用于作为操作数数据或指令的值。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 这些预取请求可以是需求负载请求,其中处理单元将需要操作数数据或指令或推测性负载请求,其中处理单元可能需要或可能不需要操作数数据或指令,但分支预测或流关联预测 他们可能需要。 在经过预定数量的周期之后,如果请求尚未完成,则推测加载请求被取消。

    Data processing system with HSA (hashed storage architecture)
    28.
    发明授权
    Data processing system with HSA (hashed storage architecture) 失效
    具有HSA(散列存储架构)的数据处理系统

    公开(公告)号:US06598118B1

    公开(公告)日:2003-07-22

    申请号:US09364284

    申请日:1999-07-30

    IPC分类号: G60F1200

    CPC分类号: G06F12/0864

    摘要: A processor having a hashed and partitioned storage subsystem includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a cache subsystem including a plurality of caches that store data utilized by the execution unit. Each cache among the plurality of caches stores only data having associated addresses within a respective one of a plurality of subsets of an address space. In one preferred embodiment, the execution units of the processor include a number of load-store units (LSUs) that each process only instructions that access data having associated addresses within a respective one of the plurality of address subsets. The processor may further be incorporated within a data processing system having a number of interconnects and a number of sets of system memory hardware that each have affinity to a respective one of the plurality of address subsets.

    摘要翻译: 具有散列和分区存储子系统的处理器包括至少一个执行单元,耦合到执行单元的指令排序单元和包括存储由执行单元使用的数据的多个高速缓存的高速缓存子系统。 多个高速缓存中的每个高速缓存仅存储具有地址空间的多个子集中的相应地址内的相关地址的数据。 在一个优选实施例中,处理器的执行单元包括多个加载存储单元(LSU),每个加载存储单元仅处理访问在多个地址子集中的相应一个地址子集内具有相关联地址的数据的指令。 处理器还可以并入具有多个互连的数据处理系统和多个系统存储器硬件的集合,每个系统存储器硬件各自对多个地址子集中的相应一个具有亲和力。

    Method and apparatus for efficiently managing caches with non-power-of-two congruence classes
    29.
    发明授权
    Method and apparatus for efficiently managing caches with non-power-of-two congruence classes 失效
    有效管理具有非二次全能级别的缓存的方法和装置

    公开(公告)号:US06434670B1

    公开(公告)日:2002-08-13

    申请号:US09435948

    申请日:1999-11-09

    IPC分类号: G06F1200

    CPC分类号: G06F12/0864 G06F12/123

    摘要: A method and apparatus for efficiently managing caches with non-power-of-two congruence classes allows for increasing the number of congruence classes in a cache when not enough area is available to double the cache size. One or more congruence classes within the cache have their associative sets split so that a number of congruence classes are created with reduced associativity. The management method and apparatus allow access to the congruence classes without introducing any additional cycles of delay or complex logic.

    摘要翻译: 一种用于非有效二等度类的高效管理高速缓存的方法和装置允许当没有足够的区域可用于使高速缓存大小加倍时,增加高速缓存中的同余类的数量。 高速缓存中的一个或多个同余类别将其关联集合拆分,从而创建具有降低的关联性的多个同余类。 管理方法和装置允许访问一致类,而不引入任何额外的延迟周期或复杂的逻辑。

    Layered local cache with imprecise reload mechanism
    30.
    发明授权
    Layered local cache with imprecise reload mechanism 有权
    分层本地缓存与不精确的重载机制

    公开(公告)号:US06434667B1

    公开(公告)日:2002-08-13

    申请号:US09340075

    申请日:1999-06-25

    IPC分类号: G06F1200

    摘要: A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.

    摘要翻译: 一种改进计算机系统的存储器访问的方法,通过将请求发送到较低级别的存储子系统以及由请求处理器对与请求的信息的预期用途有关的关联信息而不使用高级别的负载队列来进行发送。 将所请求的信息与相关联的使用信息一起返回到处理器允许立即放置信息而不使用重新加载缓冲器。 使用与缓存负载总线分离(并具有较小粒度)的寄存器负载总线返回信息。 然后可能不精确地重新加载上级(L1)高速缓存(高级缓存也可以不精确地用存储指令重新加载)。 低级(L​​2)缓存可以监视L1和L2高速缓存活动,其可用于在L1高速缓存中选择受害者缓存块(基于附加的L2信息),或者选择L2缓存中的受害缓存块( 基于附加的L1信息)。 L1目录的L2控制也允许解决某些侦听请求,而无需等待L1确认。 本发明可以应用于例如指令,操作数数据和翻译高速缓存。