Central shared queue based time multiplexed packet switch with deadlock
avoidance
    1.
    发明授权
    Central shared queue based time multiplexed packet switch with deadlock avoidance 失效
    基于中心共享队列的时间复用分组交换机具有死锁避免

    公开(公告)号:US5805589A

    公开(公告)日:1998-09-08

    申请号:US608017

    申请日:1996-03-04

    摘要: Specifically, a central queue based packet switch, illustratively an eight-way router, that advantageously avoids deadlock and an accompanying method for use therein. Specifically, each packet switch (25.sub.1) contains input port circuits (310) and output port circuits (380) inter-connected through two parallel paths: a multi-slot central queue (350) and a low latency by-pass; the latter cross-point switching matrix (360). The central queue has one slot dedicated to each output port to store a message portion ("chunk") destined for only that output port with the remaining slots being shared for all the output ports and dynamically allocated thereamong, as the need arises. Only those chunks which are contending for the same output port are stored in the central queue; otherwise, these chunks are routed to the appropriate output ports through the cross-point switching matrix.

    摘要翻译: 具体地,基于中央队列的分组交换机,示例性地是八路路由器,其有利地避免死锁以及其中使用的伴随方法。 具体地,每个分组交换机(251)包含通过两个并行路径互连的输入端口电路(310)和输出端口电路(380):多时隙中心队列(350)和低等待时间旁路; 后者的交叉点交换矩阵(360)。 中央队列具有专用于每个输出端口的一个时隙,用于存储目的地仅仅是该输出端口的消息部分(“块”),其余时隙对于所有输出端口共享,并根据需要动态分配。 只有那些竞争同一输出端口的块才存储在中央队列中; 否则,这些块通过交叉点交换矩阵路由到适当的输出端口。

    Flexible techniques for associating cache memories with processors and main memory
    2.
    发明授权
    Flexible techniques for associating cache memories with processors and main memory 失效
    将缓存存储器与处理器和主存储器相关联的灵活技术

    公开(公告)号:US07203790B2

    公开(公告)日:2007-04-10

    申请号:US11197899

    申请日:2005-08-05

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0813 G06F2212/601

    摘要: Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.

    摘要翻译: 高速缓存与处理器相关联,这样的多个高速缓存可以与多个处理器相关联。 对于不同的主存储器地址范围,该关联可能不同。 本发明的技术是灵活的,因为系统设计者可以选择高速缓存如何与处理器和主存储器组相关联,并且高速缓存,处理器和主存储器组之间的关联可以在多处理器系统运行时改变。 缓存一致性可能维持也可能不会保持。 说明性实施例中的有效地址包括兴趣组和相关联的地址。 兴趣组是到缓存向量表的索引,并且入口到缓存向量表中,并且相关联的地址用于选择其中一个高速缓存。 该选择可以是伪随机的。 或者,在一些应用中,缓存向量表可以被消除,兴趣组直接编码要使用的高速缓存的子集。

    Flexible techniques for associating cache memories with processors and main memory
    3.
    发明授权
    Flexible techniques for associating cache memories with processors and main memory 失效
    将缓存存储器与处理器和主存储器相关联的灵活技术

    公开(公告)号:US06961804B2

    公开(公告)日:2005-11-01

    申请号:US10186476

    申请日:2002-06-28

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0813 G06F2212/601

    摘要: Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.

    摘要翻译: 高速缓存与处理器相关联,这样的多个高速缓存可以与多个处理器相关联。 对于不同的主存储器地址范围,该关联可能是不同的。 本发明的技术是灵活的,因为系统设计者可以选择高速缓存如何与处理器和主存储器组相关联,并且高速缓存,处理器和主存储器组之间的关联可以在多处理器系统运行时改变。 缓存一致性可能维持也可能不会保持。 说明性实施例中的有效地址包括兴趣组和相关联的地址。 兴趣组是到缓存向量表的索引,并且入口到缓存向量表中,并且相关联的地址用于选择其中一个高速缓存。 该选择可以是伪随机的。 或者,在一些应用中,缓存向量表可以被消除,兴趣组直接编码要使用的高速缓存的子集。

    Method and parallelizing geometric processing in a graphics rendering pipeline
    4.
    发明授权
    Method and parallelizing geometric processing in a graphics rendering pipeline 有权
    在图形渲染管道中的方法和并行化几何处理

    公开(公告)号:US06384833B1

    公开(公告)日:2002-05-07

    申请号:US09371395

    申请日:1999-08-10

    IPC分类号: G06T1500

    摘要: The geometric processing of an ordered sequence of graphics commands is distributed over a set of processors by the following steps. The sequence of graphics commands is partitioned into an ordered set of N subsequences S0 . . . SN−1, and an ordered set of N state vectors V0 . . . VN−1 is associated with said ordered set of subsequences S0 . . . SN−1. A first phase of processing is performed on the set of processors whereby, for each given subsequence Sj in the set of subsequences S0 . . . SN−2, state vector Vj+1 is updated to represent state as if the graphics commands in subsequence Sj had been executed in sequential order. A second phase of the processing is performed whereby the components of each given state vector Vk in the set of state vectors V1 . . . VN−1 generated in the first phase is merged with corresponding components in the preceding state vectors V0 . . . Vk−1 such that the state vector Vk represents state as if the graphics commands in subsequences S0 . . . Sk−1 had been executed in sequential order. Finally, a third phase of processing is performed on the set of processors whereby, for each subsequence Sm in the set of subsequences S1 . . . SN−1, geometry operations for subsequence Sm are performed using the state vector Vm generated in the second phase. In addition, in the third phase, geometry operations for subsequence S0 are performed using the state vector V0. Advantageously, the present invention provides a mechanism that allows a large number of processors to work in parallel on the geometry operations of the three-dimensional rendering pipeline. Moreover, this high degree of parallelism is achieved with very little synchronization (one processor waiting from another) required, which results in increased performance over prior art graphics processing techniques.

    摘要翻译: 图形命令的有序序列的几何处理通过以下步骤分布在一组处理器上。 图形命令的顺序被划分成N个子序列S0的有序集合。 。 。 SN-1和N个状态向量V0的有序集合。 。 。 VN-1与所述有序的子序列S0相关联。 。 。 SN-1。 对该组处理器执行第一阶段的处理,其中对于子集序列S0中的每个给定子序列Sj。 。 。 SN-2,状态向量Vj + 1被更新以表示状态,好像子序列Sj中的图形命令已按顺序执行。 执行处理的第二阶段,由此在状态向量集合V1中的每个给定状态向量V k的分量被执行。 。 。 在第一阶段生成的VN-1与先前状态矢量V0中的相应分量合并。 。 。 Vk-1,使得状态向量Vk表示如同在子序列S0中的图形命令的状态。 。 。 Sk-1已按顺序执行。 最后,对该组处理器执行第三阶段处理,其中对于子集序列S1中的每个子序列Sm。 。 。 SN-1,使用在第二阶段中生成的状态矢量Vm来执行子序列Sm的几何运算。 此外,在第三阶段中,使用状态向量V0执行子序列S0的几何运算。 有利的是,本发明提供了允许大量处理器在三维渲染流水线的几何运算上并行工作的机制。 此外,通过所需的非常少的同步(一个处理器从另一个处理器等待)来实现这种高度的并行性,这导致比现有技术的图形处理技术更高的性能。

    Programmable network protocol handler architecture
    5.
    发明授权
    Programmable network protocol handler architecture 失效
    可编程网络协议处理器架构

    公开(公告)号:US07676588B2

    公开(公告)日:2010-03-09

    申请号:US11387875

    申请日:2006-03-24

    IPC分类号: G06F15/16 G06F3/00

    摘要: An architecture that achieves high speed performance in a network protocol handler combines parallelism and pipelining in multiple programmable processors, along with specialized front-end logic at the network interface that handles time critical protocol operations. The multiple processors are interconnected via a high-speed interconnect, using a multi-token counter protocol for data transmission between processors and between processors and memory. Each processor's memory is globally accessible by other processors, and memory synchronization operations are used to obviate the need for “spin-locks”. Each processor has multiple threads, each capable of fully executing programs. Threads within a processor are assigned the processing of various protocol functions in a parallel/pipelined fashion. Data frame processing is done by one or more of the threads to identify related frames. Related frames are dispatched to the same thread so as to minimize the overhead associated with memory accesses and general protocol processing.

    摘要翻译: 在网络协议处理器中实现高速性能的架构将多个可编程处理器中的并行性和流水线结合在一起,以及处理时间关键协议操作的网络接口处的专用前端逻辑。 多处理器通过高速互连互连,使用多令牌计数器协议在处理器之间以及处理器和存储器之间进行数据传输。 每个处理器的存储器可由其他处理器全局访问,并且使用存储器同步操作来消除对“自旋锁”的需要。 每个处理器有多个线程,每个线程都能完全执行程序。 处理器中的线程以并行/流水线方式分配各种协议功能的处理。 数据帧处理由一个或多个线程完成以识别相关帧。 相关帧被调度到相同的线程,以便最小化与存储器访问和通用协议处理相关联的开销。