Mechanism for selectively imposing interference order between page-table fetches and corresponding data fetches
    1.
    发明授权
    Mechanism for selectively imposing interference order between page-table fetches and corresponding data fetches 失效
    选择性地强制页表提取之间的干扰顺序和相应数据提取的机制

    公开(公告)号:US06286090B1

    公开(公告)日:2001-09-04

    申请号:US09084621

    申请日:1998-05-26

    IPC分类号: G06F1200

    CPC分类号: G06F12/1054 G06F12/0813

    摘要: A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.

    摘要翻译: 一种技术选择性地将由多处理器系统的处理器发出的存储器参考操作之间的参考间排序施加于与由翻译缓冲器(TB)错过流程程影响的页表项(PTE)相关的页面内的地址。 TB错误流被用于检索包含在PTE中的信息,用于将虚拟地址映射到物理地址,并且随后允许在映射的物理地址处检索数据。 响应于存储器引用(读取)操作检索的PTE不会被加载到TB中,直到与该读取操作相关联的提交信号返回到处理器。 一旦返回了PTE和相关联的提交信号,处理器将PTE加载到TB中,以便它可以用于针对物理地址的数据的后续读取操作。

    Method and apparatus for employing commit-signals and prefetching to
maintain inter-reference ordering in a high-performance I/O processor
    2.
    发明授权
    Method and apparatus for employing commit-signals and prefetching to maintain inter-reference ordering in a high-performance I/O processor 失效
    用于采用提交信号和预取以在高性能I / O处理器中维持参考间排序的方法和装置

    公开(公告)号:US6085263A

    公开(公告)日:2000-07-04

    申请号:US956861

    申请日:1997-10-24

    IPC分类号: G06F12/08 G06F13/12 G06F13/14

    摘要: An improved I/O processor (IOP) delivers high I/O performance while maintaining inter-reference ordering among memory reference operations issued by an I/O device as specified by a consistency model in a shared memory multiprocessor system. The IOP comprises a retire controller which imposes inter-reference ordering among the operations based on receipt of a commit signal for each operation, wherein the commit signal for a memory reference operation indicates the apparent completion of the operation rather than actual completion of the operation. In addition, the IOP comprises a prefetch controller coupled to an I/O cache for prefetching data into cache without any ordering constraints (or out-of-order). The ordered retirement functions of the IOP are separated from its prefetching operations, which enables the latter operations to be performed in an arbitrary manner so as to improve the overall performance of the system.

    摘要翻译: 改进的I / O处理器(IOP)提供高I / O性能,同时在由共享存储器多处理器系统中的一致性模型指定的I / O设备发出的存储器参考操作之间保持参考间排序。 IOP包括退出控制器,其基于对每个操作的提交信号的接收,在操作之间施加参考间排序,其中用于存储器参考操作的提交信号指示操作的明显完成,而不是实际完成操作。 此外,IOP包括耦合到I / O缓存的预取控制器,用于将数据预取到高速缓存中,而没有任何排序限制(或无序)。 IOP的有序退休功能与其预取操作分离,这使得后面的操作能够以任意方式执行,从而提高系统的整体性能。

    Mechanism for optimizing generation of commit-signals in a distributed shared-memory system
    3.
    发明授权
    Mechanism for optimizing generation of commit-signals in a distributed shared-memory system 失效
    优化分布式共享内存系统中提交信号生成的机制

    公开(公告)号:US06209065B1

    公开(公告)日:2001-03-27

    申请号:US08957230

    申请日:1997-10-24

    IPC分类号: G06F1314

    CPC分类号: G06F9/542 G06F9/52

    摘要: A mechanism optimizes the generation of a commit-signal by control logic of the multiprocessor system in response to a memory reference operation issued by a processor to a local node of a multiprocessor system having a hierarchical switch for interconnecting a plurality of nodes. The mechanism generally comprises a structure that indicates whether the memory reference operation affects other processors of other nodes of the multiprocessor system. An ordering point of the local node generates an optimized commit-signal when the structure indicates that the memory reference operation does not affect the other processors.

    摘要翻译: 一种机制响应于处理器向具有用于互连多个节点的分层交换机的多处理器系统的本地节点发出的存储器参考操作来优化多处理器系统的控制逻辑的生成提交信号。 该机制通常包括指示存储器参考操作是否影响多处理器系统的其他节点的其他处理器的结构。 当结构指示存储器参考操作不影响其他处理器时,本地节点的排序点生成优化的提交信号。

    Mechanism for reducing latency of memory barrier operations on a
multiprocessor system
    4.
    发明授权
    Mechanism for reducing latency of memory barrier operations on a multiprocessor system 失效
    减少多处理器系统上存储器屏障操作延迟的机制

    公开(公告)号:US6088771A

    公开(公告)日:2000-07-11

    申请号:US957501

    申请日:1997-10-24

    IPC分类号: G06F9/45 G06F13/00

    摘要: A technique reduces the latency of a memory barrier (MB) operation used to impose an inter-reference order between sets of memory reference operations issued by a processor to a multiprocessor system having a shared memory. The technique comprises issuing the MB operation immediately after issuing a first set of memory reference operations (i.e., the pre-MB operations) without waiting for responses to those pre-MB operations. Issuance of the MB operation to the system results in serialization of that operation and generation of a MB Acknowledgment (MB-Ack) command. The MB-Ack is loaded into a probe queue of the issuing processor and, according to the invention, functions to pull-in all previously ordered invalidate and probe commands in that queue. By ensuring that the probes and invalidates are ordered before the MB-Ack is received at the issuing processor, the inventive technique provides the appearance that all pre-MB references have completed.

    摘要翻译: 一种技术减少了用于在处理器向具有共享存储器的多处理器系统发出的存储器参考操作的集合之间施加参考间顺序的存储器屏障(MB)操作的等待时间。 该技术包括在发出第一组存储器参考操作(即,预MB操作)之前立即发出MB操作,而不等待对那些MB前操作的响应。 向系统发出MB操作会导致该操作的序列化和生成MB确认(MB-Ack)命令。 MB-Ack被加载到发布处理器的探测队列中,并且根据本发明,该功能用于在该队列中引入所有先前订购的无效和探测命令。 通过确保在发布处理器接收到MB-Ack之前对探测和无效进行排序,本发明技术提供了所有pre-MB引用完成的外观。

    Technique for reducing latency of inter-reference ordering using commit
signals in a multiprocessor system having shared caches
    5.
    发明授权
    Technique for reducing latency of inter-reference ordering using commit signals in a multiprocessor system having shared caches 失效
    用于在具有共享高速缓存的多处理器系统中使用提交信号来减少参考间排序的等待时间的技术

    公开(公告)号:US6055605A

    公开(公告)日:2000-04-25

    申请号:US957544

    申请日:1997-10-24

    IPC分类号: G06F12/08 G06F13/00 G06F12/00

    CPC分类号: G06F12/084

    摘要: A technique reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory that is distributed among a plurality of processors that share a cache. According to the technique, each processor sharing a cache inherits a commit-signal that is generated by control logic of the multiprocessor system in response to a memory reference operation issued by another processor sharing that cache. The commit-signal facilitates serialization among the processors and shared memory entities of the multiprocessor system by indicating the apparent completion of the memory reference operation to those entities of the system.

    摘要翻译: 一种技术减少了具有分配在共享高速缓存的多个处理器之间的共享存储器的多处理器系统中的存储器参考操作组之间的参考间排序的等待时间。 根据该技术,共享高速缓存的每个处理器响应由共享该高速缓存的另一个处理器发出的存储器引用操作而继承由多处理器系统的控制逻辑产生的提交信号。 提交信号通过指示对系统的那些实体的存储器参考操作的明显完成来促进多处理器系统的处理器和共享存储器实体之间的串行化。

    Method and apparatus for disambiguating change-to-dirty commands in a
switch based multi-processing system with coarse directories
    7.
    发明授权
    Method and apparatus for disambiguating change-to-dirty commands in a switch based multi-processing system with coarse directories 失效
    在具有粗略目录的基于交换机的多处理系统中消除歧义指令的方法和装置

    公开(公告)号:US6101420A

    公开(公告)日:2000-08-08

    申请号:US957543

    申请日:1997-10-24

    IPC分类号: G05B19/18

    摘要: An architecture and coherency protocol for use in a large SMP computer system includes a hierarchical switch structure which allows for a number of multi-processor nodes to be coupled to the switch to operate at an optimum performance. Within each multi-processor node, a simultaneous buffering system is provided that allows all of the processors of the multi-processor node to operate at peak performance. A memory is shared among the nodes, with a portion of the memory resident at each of the multi-processor nodes. Each of the multi-processor nodes includes a number of elements for maintaining memory coherency, including a victim cache, a directory and a transaction tracking table. The victim cache allows for selective updates of victim data destined for memory stored at a remote multi-processing node, thereby improving the overall performance of memory. Memory performance is additionally improved by including, at each memory, a delayed write buffer which is used in conjunction with the directory to identify victims that are to be written to memory. An arb bus coupled to the output of the directory of each node provides a central ordering point for all messages that are transferred through the SMP. The messages comprise a number of transactions, and each transaction is assigned to a number of different virtual channels, depending upon the processing stage of the message. The use of virtual channels thus helps to maintain data coherency by providing a straightforward method for maintaining system order. Using the virtual channels and the directory structure, cache coherency problems that would previously result in deadlock may be avoided.

    摘要翻译: 用于大SMP计算机系统的架构和一致性协议包括分层交换结构,其允许多个多处理器节点耦合到交换机以以最佳性能进行操作。 在每个多处理器节点内,提供同时缓冲系统,其允许多处理器节点的所有处理器以最高性能运行。 存储器在节点之间共享,存储器的一部分驻留在每个多处理器节点处。 每个多处理器节点包括用于维持存储器一致性的多个元件,包括受害缓存,目录和事务跟踪表。 受害者缓存允许选择性地更新目的地存储在远程多处理节点处的存储器的受害者数据,从而提高存储器的整体性能。 通过在每个存储器处包括延迟的写入缓冲器来进一步改善存储器性能,该缓冲器与目录一起使用以识别要写入存储器的受害者。 耦合到每个节点的目录的输出的arb总线为通过SMP传输的所有消息提供了中心排序点。 消息包括多个事务,并且根据消息的处理阶段,将每个事务分配给多个不同的虚拟通道。 因此,通过提供用于维护系统顺序的简单方法,使用虚拟通道有助于维持数据一致性。 使用虚拟通道和目录结构,可以避免先前导致死锁的高速缓存一致性问题。

    System and method for searching an extended database
    8.
    发明授权
    System and method for searching an extended database 有权
    用于搜索扩展数据库的系统和方法

    公开(公告)号:US07174346B1

    公开(公告)日:2007-02-06

    申请号:US10676650

    申请日:2003-09-30

    IPC分类号: G06F17/30

    摘要: Once a search query is received from a user, a standard index is searched based on the search query. The standard index forms part of a set of replicated standard indexes having multiple instances of the standard index. A signal is then determined based on the search of the standard index. When the received signal meets predefined criteria, an extended index is searched. The extended index forms part of a set of extended indexes having at least one instance of the extended index. There are fewer instances of the extended index than instances of the standard index. Extended search results are then obtained from the extended index and at least a portion of the extended search results is transmitted towards a user.

    摘要翻译: 一旦从用户接收到搜索查询,就会根据搜索查询来搜索标准索引。 标准索引构成了具有标准索引的多个实例的一组复制标准索引的一部分。 然后基于标准索引的搜索来确定信号。 当接收到的信号满足预定标准时,搜索扩展索引。 扩展索引构成一组具有扩展索引的至少一个实例的扩展索引的一部分。 扩展索引的实例少于标准索引的实例。 然后从扩展索引获得扩展搜索结果,并向用户发送扩展搜索结果的至少一部分。

    Scalable architecture based on single-chip multiprocessing
    9.
    发明授权
    Scalable architecture based on single-chip multiprocessing 有权
    基于单芯片多处理的可扩展架构

    公开(公告)号:US06988170B2

    公开(公告)日:2006-01-17

    申请号:US10693388

    申请日:2003-10-24

    IPC分类号: G06F12/00

    摘要: A chip-multiprocessing system with scalable architecture, including on a single chip: a plurality of processor cores; a two-level cache hierarchy; an intra-chip switch; one or more memory controllers; a cache coherence protocol; one or more coherence protocol engines; and an interconnect subsystem. The two-level cache hierarchy includes first level and second level caches. In particular, the first level caches include a pair of instruction and data caches for, and private to, each processor core. The second level cache has a relaxed inclusion property, the second-level cache being logically shared by the plurality of processor cores. Each of the plurality of processor cores is capable of executing an instruction set of the ALPHA™ processing core. The scalable architecture of the chip-multiprocessing system is targeted at parallel commercial workloads. A showcase example of the chip-multiprocessing system, called the PIRAHNA™ system, is a highly integrated processing node with eight simpler ALPHA™ processor cores. A method for scalable chip-multiprocessing is also provided.

    摘要翻译: 具有可扩展架构的芯片多处理系统,包括在单个芯片上:多个处理器内核; 两级缓存层次结构; 片内开关; 一个或多个存储器控制器; 缓存一致性协议; 一个或多个一致性协议引擎; 和互连子系统。 两级缓存层次结构包括第一级和第二级缓存。 特别地,第一级高速缓存包括用于每个处理器核的私有指令和数据高速缓存。 第二级缓存具有轻松的包含属性,第二级缓存由多个处理器核逻辑地共享。 多个处理器核心中的每一个能够执行ALPHA TM处理核心的指令集。 芯片多处理系统的可扩展架构针对并行商业工作负载。 称为PIRAHNA(TM)系统的芯片多处理系统的展示示例是具有八个更简单的ALPHA(TM)处理器内核的高度集成的处理节点。 还提供了一种可扩展的芯片多处理方法。

    Scalable multiprocessor system and cache coherence method
    10.
    发明授权
    Scalable multiprocessor system and cache coherence method 失效
    可扩展的多处理器系统和缓存一致性方法

    公开(公告)号:US06751710B2

    公开(公告)日:2004-06-15

    申请号:US09878982

    申请日:2001-06-11

    IPC分类号: G06F1200

    摘要: The present invention relates generally to multiprocessor computer system, and particularly to a multiprocessor system designed to be highly scalable, using efficient cache coherence logic and methodologies. More specifically, the present invention is a system and method including a plurality of processor nodes configured to execute a cache coherence protocol that avoids the use of negative acknowledgment messages (NAKs) and ordering requirements on the underlying transaction-message interconnect/network and services most 3-hop transactions with only a single visit to the home node.

    摘要翻译: 本发明一般涉及多处理器计算机系统,特别涉及使用有效的高速缓存一致性逻辑和方法来设计为高度可扩展的多处理器系统。 更具体地说,本发明是一种包括多个处理器节点的系统和方法,所述多个处理器节点被配置为执行避免使用否定确认消息(NAK)的高速缓存一致性协议以及对底层事务 - 消息互联/网络和服务的排序要求 只有一次访问家庭节点的3跳交易。