摘要:
Specifically, a central queue based packet switch, illustratively an eight-way router, that advantageously avoids deadlock and an accompanying method for use therein. Specifically, each packet switch (25.sub.1) contains input port circuits (310) and output port circuits (380) inter-connected through two parallel paths: a multi-slot central queue (350) and a low latency by-pass; the latter cross-point switching matrix (360). The central queue has one slot dedicated to each output port to store a message portion ("chunk") destined for only that output port with the remaining slots being shared for all the output ports and dynamically allocated thereamong, as the need arises. Only those chunks which are contending for the same output port are stored in the central queue; otherwise, these chunks are routed to the appropriate output ports through the cross-point switching matrix.
摘要:
Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.
摘要:
Caches are associated with processors, such multiple caches may be associated with multiple processors. This association may be different for different main memory address ranges. The techniques of the invention are flexible, as a system designer can choose how the caches are associated with processors and main memory banks, and the association between caches, processors, and main memory banks may be changed while the multiprocessor system is operating. Cache coherence may or may not be maintained. An effective address in an illustrative embodiment comprises an interest group and an associated address. The interest group is an index into a cache vector table and an entry into the cache vector table and the associated address is used to select one of the caches. This selection can be pseudo-random. Alternatively, in some applications, the cache vector table may be eliminated, with the interest group directly encoding the subset of caches to use.
摘要:
The geometric processing of an ordered sequence of graphics commands is distributed over a set of processors by the following steps. The sequence of graphics commands is partitioned into an ordered set of N subsequences S0 . . . SN−1, and an ordered set of N state vectors V0 . . . VN−1 is associated with said ordered set of subsequences S0 . . . SN−1. A first phase of processing is performed on the set of processors whereby, for each given subsequence Sj in the set of subsequences S0 . . . SN−2, state vector Vj+1 is updated to represent state as if the graphics commands in subsequence Sj had been executed in sequential order. A second phase of the processing is performed whereby the components of each given state vector Vk in the set of state vectors V1 . . . VN−1 generated in the first phase is merged with corresponding components in the preceding state vectors V0 . . . Vk−1 such that the state vector Vk represents state as if the graphics commands in subsequences S0 . . . Sk−1 had been executed in sequential order. Finally, a third phase of processing is performed on the set of processors whereby, for each subsequence Sm in the set of subsequences S1 . . . SN−1, geometry operations for subsequence Sm are performed using the state vector Vm generated in the second phase. In addition, in the third phase, geometry operations for subsequence S0 are performed using the state vector V0. Advantageously, the present invention provides a mechanism that allows a large number of processors to work in parallel on the geometry operations of the three-dimensional rendering pipeline. Moreover, this high degree of parallelism is achieved with very little synchronization (one processor waiting from another) required, which results in increased performance over prior art graphics processing techniques.
摘要:
An architecture that achieves high speed performance in a network protocol handler combines parallelism and pipelining in multiple programmable processors, along with specialized front-end logic at the network interface that handles time critical protocol operations. The multiple processors are interconnected via a high-speed interconnect, using a multi-token counter protocol for data transmission between processors and between processors and memory. Each processor's memory is globally accessible by other processors, and memory synchronization operations are used to obviate the need for “spin-locks”. Each processor has multiple threads, each capable of fully executing programs. Threads within a processor are assigned the processing of various protocol functions in a parallel/pipelined fashion. Data frame processing is done by one or more of the threads to identify related frames. Related frames are dispatched to the same thread so as to minimize the overhead associated with memory accesses and general protocol processing.