Method and system for managing cache injection in a multiprocessor system
    1.
    发明授权
    Method and system for managing cache injection in a multiprocessor system 有权
    在多处理器系统中管理缓存注入的方法和系统

    公开(公告)号:US08255591B2

    公开(公告)日:2012-08-28

    申请号:US10948407

    申请日:2004-09-23

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: A method and apparatus for managing cache injection in a multiprocessor system reduces processing time associated with direct memory access transfers in a symmetrical multiprocessor (SMP) or a non-uniform memory access (NUMA) multiprocessor environment. The method and apparatus either detect the target processor for DMA completion or direct processing of DMA completion to a particular processor, thereby enabling cache injection to a cache that is coupled with processor that executes the DMA completion routine processing the data injected into the cache. The target processor may be identified by determining the processor handling the interrupt that occurs on completion of the DMA transfer. Alternatively or in conjunction with target processor identification, an interrupt handler may queue a deferred procedure call to the target processor to process the transferred data. In NUMA multiprocessor systems, the completing processor/target memory is chosen for accessibility of the target memory to the processor and associated cache.

    摘要翻译: 用于管理多处理器系统中的高速缓存注入的方法和装置减少与对称多处理器(SMP)或非均匀存储器访问(NUMA)多处理器环境中的直接存储器访问传输相关联的处理时间。 该方法和装置可以检测目标处理器用于DMA完成或直接处理DMA完成到特定处理器,从而使高速缓存注入与执行DMA完成例程的处理器处理注入高速缓存的数据的处理器相连的高速缓存。 可以通过确定处理器处理在DMA传输完成时发生的中断来识别目标处理器。 或者或与目标处理器识别结合,中断处理程序可以将延迟过程调用排队到目标处理器以处理传送的数据。 在NUMA多处理器系统中,选择完成的处理器/目标存储器,以便可访问目标存储器到处理器和相关联的高速缓存。

    Flexible techniques for associating cache memories with processors and main memory
    2.
    发明授权
    Flexible techniques for associating cache memories with processors and main memory 失效
    将缓存存储器与处理器和主存储器相关联的灵活技术

    公开(公告)号:US07203790B2

    公开(公告)日:2007-04-10

    申请号:US11197899

    申请日:2005-08-05

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0813 G06F2212/601

    摘要: Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.

    摘要翻译: 高速缓存与处理器相关联,这样的多个高速缓存可以与多个处理器相关联。 对于不同的主存储器地址范围,该关联可能不同。 本发明的技术是灵活的,因为系统设计者可以选择高速缓存如何与处理器和主存储器组相关联,并且高速缓存,处理器和主存储器组之间的关联可以在多处理器系统运行时改变。 缓存一致性可能维持也可能不会保持。 说明性实施例中的有效地址包括兴趣组和相关联的地址。 兴趣组是到缓存向量表的索引,并且入口到缓存向量表中,并且相关联的地址用于选择其中一个高速缓存。 该选择可以是伪随机的。 或者,在一些应用中,缓存向量表可以被消除,兴趣组直接编码要使用的高速缓存的子集。

    Central shared queue based time multiplexed packet switch with deadlock
avoidance
    3.
    发明授权
    Central shared queue based time multiplexed packet switch with deadlock avoidance 失效
    基于中心共享队列的时间复用分组交换机具有死锁避免

    公开(公告)号:US5805589A

    公开(公告)日:1998-09-08

    申请号:US608017

    申请日:1996-03-04

    摘要: Specifically, a central queue based packet switch, illustratively an eight-way router, that advantageously avoids deadlock and an accompanying method for use therein. Specifically, each packet switch (25.sub.1) contains input port circuits (310) and output port circuits (380) inter-connected through two parallel paths: a multi-slot central queue (350) and a low latency by-pass; the latter cross-point switching matrix (360). The central queue has one slot dedicated to each output port to store a message portion ("chunk") destined for only that output port with the remaining slots being shared for all the output ports and dynamically allocated thereamong, as the need arises. Only those chunks which are contending for the same output port are stored in the central queue; otherwise, these chunks are routed to the appropriate output ports through the cross-point switching matrix.

    摘要翻译: 具体地,基于中央队列的分组交换机,示例性地是八路路由器,其有利地避免死锁以及其中使用的伴随方法。 具体地,每个分组交换机(251)包含通过两个并行路径互连的输入端口电路(310)和输出端口电路(380):多时隙中心队列(350)和低等待时间旁路; 后者的交叉点交换矩阵(360)。 中央队列具有专用于每个输出端口的一个时隙,用于存储目的地仅仅是该输出端口的消息部分(“块”),其余时隙对于所有输出端口共享,并根据需要动态分配。 只有那些竞争同一输出端口的块才存储在中央队列中; 否则,这些块通过交叉点交换矩阵路由到适当的输出端口。

    RAM based implementation for scalable, reliable high speed event counters
    4.
    发明授权
    RAM based implementation for scalable, reliable high speed event counters 失效
    基于RAM的实现可扩展,可靠的高速事件计数器

    公开(公告)号:US08660234B2

    公开(公告)日:2014-02-25

    申请号:US12183748

    申请日:2008-07-31

    IPC分类号: H03K21/00

    摘要: There is broadly contemplated herein an arrangement whereby each event source feeds a small dedicated “pre-counter” while an actual count is kept in a 64-bit wide RAM. Such an implementation preferably may involve a state machine that simply sweeps through the pre-counters, in a predetermined fixed order. Preferably, the state machine will access each pre-counter, add the value from the pre-counter to a corresponding RAM location, and then clear the pre-counter. Accordingly, the pre-counters merely have to be wide enough such that even at a maximal event rate, the pre-counter will not be able to wrap (i.e., reach capacity or overflow) before the “sweeper” state machine accesses the pre-counter.

    摘要翻译: 这里广泛考虑了一种布置,其中每个事件源馈送小的专用“预计数器”,而实际计数保持在64位宽的RAM中。 这种实施方式优选地可以包括以预定的固定顺序简单地扫过预先计数器的状态机。 优选地,状态机将访问每个预计数器,将来自预计数器的值添加到相应的RAM位置,然后清除预计数器。 因此,预计数器仅需要足够宽,使得即使在最大事件速率下,在“扫描器”状态机访问预先计数器之前,预计数器将不能包裹(即,达到容量或溢出) 计数器。

    Memory bus address snooper logic for determining memory activity without
performing memory accesses
    5.
    发明授权
    Memory bus address snooper logic for determining memory activity without performing memory accesses 失效
    用于在不执行存储器访问的情况下确定存储器活动的存储器总线地址窥探逻辑

    公开(公告)号:US5901326A

    公开(公告)日:1999-05-04

    申请号:US756447

    申请日:1996-11-26

    CPC分类号: G06F13/4243

    摘要: A parallel multiprocessor data processing system having a plurality of nodes for processing data and a switch connected to each of said nodes for switching messages between the nodes, each node having a node processor for defining messages under program control to be sent to another node. Each of the nodes has an I/O processor for controlling the sending of messages to another node via the switch, and a shared memory which can be accessed by both the node processor and the I/O processor. Instructions for the messages to be sent by the I/O processor are stored in mailboxes in the shared memory by the node processor. A comparing circuit compares addresses on the bus to the contents of a plurality of address registers and sets the corresponding bit in a results register for each match. The adapter processor reads the contents of the results register such that the adapter processor may, with a single bus access, determine all mailboxes that have been accessed by the node processor.

    摘要翻译: 一种并行多处理器数据处理系统,具有用于处理数据的多个节点和连接到每个所述节点的交换机,用于在节点之间切换消息,每个节点具有用于定义程序控制下的消息以发送到另一个节点的节点处理器。 每个节点具有用于控制经由交换机将消息发送到另一个节点的I / O处理器,以及可由节点处理器和I / O处理器访问的共享存储器。 由I / O处理器发送的消息的说明由节点处理器存储在共享存储器中的邮箱中。 比较电路将总线上的地址与多个地址寄存器的内容进行比较,并为每个匹配设置结果寄存器中的相应位。 适配器处理器读取结果寄存器的内容,使得适配器处理器可以通过单个总线访问来确定节点处理器已经访问的所有邮箱。

    RAM BASED IMPLEMENTATION FOR SCALABLE, RELIABLE HIGH SPEED EVENT COUNTERS
    6.
    发明申请
    RAM BASED IMPLEMENTATION FOR SCALABLE, RELIABLE HIGH SPEED EVENT COUNTERS 失效
    基于RAM的可扩展,可靠的高速事件计数器的实现

    公开(公告)号:US20100027735A1

    公开(公告)日:2010-02-04

    申请号:US12183748

    申请日:2008-07-31

    IPC分类号: H03K21/00

    摘要: There is broadly contemplated herein an arrangement whereby each event source feeds a small dedicated “pre-counter” while an actual count is kept in a 64-bit wide RAM. Such an implementation preferably may involve a state machine that simply sweeps through the pre-counters, in a predetermined fixed order. Preferably, the state machine will access each pre-counter, add the value from the pre-counter to a corresponding RAM location, and then clear the pre-counter. Accordingly, the pre-counters merely have to be wide enough such that even at a maximal event rate, the pre-counter will not be able to wrap (i.e., reach capacity or overflow) before the “sweeper” state machine accesses the pre-counter.

    摘要翻译: 这里广泛考虑了一种布置,其中每个事件源馈送小的专用“预计数器”,而实际计数保持在64位宽的RAM中。 这种实施方式优选地可以包括以预定的固定顺序简单地扫过预先计数器的状态机。 优选地,状态机将访问每个预计数器,将来自预计数器的值添加到相应的RAM位置,然后清除预计数器。 因此,预计数器仅需要足够宽,使得即使在最大事件速率下,在“扫描器”状态机访问预先计数器之前,预计数器将不能包裹(即,达到容量或溢出) 计数器。

    Flexible techniques for associating cache memories with processors and main memory
    7.
    发明授权
    Flexible techniques for associating cache memories with processors and main memory 失效
    将缓存存储器与处理器和主存储器相关联的灵活技术

    公开(公告)号:US06961804B2

    公开(公告)日:2005-11-01

    申请号:US10186476

    申请日:2002-06-28

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0813 G06F2212/601

    摘要: Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.

    摘要翻译: 高速缓存与处理器相关联,这样的多个高速缓存可以与多个处理器相关联。 对于不同的主存储器地址范围,该关联可能是不同的。 本发明的技术是灵活的,因为系统设计者可以选择高速缓存如何与处理器和主存储器组相关联,并且高速缓存,处理器和主存储器组之间的关联可以在多处理器系统运行时改变。 缓存一致性可能维持也可能不会保持。 说明性实施例中的有效地址包括兴趣组和相关联的地址。 兴趣组是到缓存向量表的索引,并且入口到缓存向量表中,并且相关联的地址用于选择其中一个高速缓存。 该选择可以是伪随机的。 或者,在一些应用中,缓存向量表可以被消除,兴趣组直接编码要使用的高速缓存的子集。

    Method and parallelizing geometric processing in a graphics rendering pipeline
    8.
    发明授权
    Method and parallelizing geometric processing in a graphics rendering pipeline 有权
    在图形渲染管道中的方法和并行化几何处理

    公开(公告)号:US06384833B1

    公开(公告)日:2002-05-07

    申请号:US09371395

    申请日:1999-08-10

    IPC分类号: G06T1500

    摘要: The geometric processing of an ordered sequence of graphics commands is distributed over a set of processors by the following steps. The sequence of graphics commands is partitioned into an ordered set of N subsequences S0 . . . SN−1, and an ordered set of N state vectors V0 . . . VN−1 is associated with said ordered set of subsequences S0 . . . SN−1. A first phase of processing is performed on the set of processors whereby, for each given subsequence Sj in the set of subsequences S0 . . . SN−2, state vector Vj+1 is updated to represent state as if the graphics commands in subsequence Sj had been executed in sequential order. A second phase of the processing is performed whereby the components of each given state vector Vk in the set of state vectors V1 . . . VN−1 generated in the first phase is merged with corresponding components in the preceding state vectors V0 . . . Vk−1 such that the state vector Vk represents state as if the graphics commands in subsequences S0 . . . Sk−1 had been executed in sequential order. Finally, a third phase of processing is performed on the set of processors whereby, for each subsequence Sm in the set of subsequences S1 . . . SN−1, geometry operations for subsequence Sm are performed using the state vector Vm generated in the second phase. In addition, in the third phase, geometry operations for subsequence S0 are performed using the state vector V0. Advantageously, the present invention provides a mechanism that allows a large number of processors to work in parallel on the geometry operations of the three-dimensional rendering pipeline. Moreover, this high degree of parallelism is achieved with very little synchronization (one processor waiting from another) required, which results in increased performance over prior art graphics processing techniques.

    摘要翻译: 图形命令的有序序列的几何处理通过以下步骤分布在一组处理器上。 图形命令的顺序被划分成N个子序列S0的有序集合。 。 。 SN-1和N个状态向量V0的有序集合。 。 。 VN-1与所述有序的子序列S0相关联。 。 。 SN-1。 对该组处理器执行第一阶段的处理,其中对于子集序列S0中的每个给定子序列Sj。 。 。 SN-2,状态向量Vj + 1被更新以表示状态,好像子序列Sj中的图形命令已按顺序执行。 执行处理的第二阶段,由此在状态向量集合V1中的每个给定状态向量V k的分量被执行。 。 。 在第一阶段生成的VN-1与先前状态矢量V0中的相应分量合并。 。 。 Vk-1,使得状态向量Vk表示如同在子序列S0中的图形命令的状态。 。 。 Sk-1已按顺序执行。 最后,对该组处理器执行第三阶段处理,其中对于子集序列S1中的每个子序列Sm。 。 。 SN-1,使用在第二阶段中生成的状态矢量Vm来执行子序列Sm的几何运算。 此外,在第三阶段中,使用状态向量V0执行子序列S0的几何运算。 有利的是,本发明提供了允许大量处理器在三维渲染流水线的几何运算上并行工作的机制。 此外,通过所需的非常少的同步(一个处理器从另一个处理器等待)来实现这种高度的并行性,这导致比现有技术的图形处理技术更高的性能。