摘要:
A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroninstruction pipelining is employed (instead of microinstruction pipelining), with queuing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. A hierarchical cache arrangement is used, increasing the likelihood of a cache hit. A writeback cache is used (instead of writethrough) and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. Separate queues are provided for the return data from memory and cache invalidates, yet the order or bus transactions is maintained by a pointer arrangement. The bus protocol used by the CPU to communicate with the system bus is of the pended type, with transactions on the bus identified by an ID field specifying the originator, and arbitration for bus grant goes one simultaneously with address/data transactions on the bus.
摘要:
A network transport layer accelerator accelerates processing of packets so that packets can be forwarded at wire-speed. To accelerate processing of packets, the accelerator performs pre-processing on a network transport layer header encapsulated in a packet for a connection and performs in-line network transport layer checksum insertion prior to transmitting a packet. A timer unit in the accelerator schedules processing of the received packets. The accelerator also includes a free pool allocator which manages buffers for storing the received packets and a packet order unit which synchronizes processing of received packets for a same connection.
摘要:
A computer-readable instruction is described for traversing deterministic finite automata (DFA) graphs to perform a pattern search in the in-coming packet data in real-time. The instruction includes one or more pre-defined fields. One of the fields includes a DFA graph identifier for identifying one of several previously-stored DFA graphs. Another one of the fields includes an input reference for identifying input data to be processed using the identified DFA graphs. Yet another one of the fields includes an output reference for storing results generated responsive to the processed input data. The instructions are forwarded to a DFA engine adapted to process the input data using the identified DFA graph and to provide results as instructed by the output reference.
摘要:
A random number generator comprising an entropy generator and a mixing function. The mixing function to read a seed from the entropy generator, to modify the seed, to insert the modified seed into a mixing function, to initialize a set of input variables used in the mixing function to generate a robust random number, and to generate subsequent robust random numbers using the mixing function without re-initializing any of the set of input variables.
摘要:
Methods and apparatus are provided for selectively replicating a data structure in a low-latency memory. The memory includes multiple individual memory banks configured to store replicated copies of the same data structure. Upon receiving a request to access the stored data structure, a low-latency memory access controller selects one of the memory banks, then accesses the stored data from the selected memory bank. Selection of a memory bank can be accomplished using a thermometer technique comparing the relative availability of the different memory banks. Exemplary data structures that benefit from the resulting efficiencies include deterministic finite automata (DFA) graphs and other data structures that are loaded (i.e., read) more often than they are stored (i.e., written).
摘要:
An improved content search mechanism uses a graph that includes intelligent nodes avoids the overhead of post processing and improves the overall performance of a content processing application. An intelligent node is similar to a node in a DFA graph but includes a command. The command in the intelligent node allows additional state for the node to be generated and checked. This additional state allows the content search mechanism to traverse the same node with two different interpretations. By generating state for the node, the graph of nodes does not become exponential. It also allows a user function to be called upon reaching a node, which can perform any desired user tasks, including modifying the input data or position.
摘要:
A content aware application processing system is provided for allowing directed access to data stored in a non-cache memory thereby bypassing cache coherent memory. The processor includes a system interface to cache coherent memory and a low latency memory interface to a non-cache coherent memory. The system interface directs memory access for ordinary load/store instructions executed by the processor to the cache coherent memory. The low latency memory interface directs memory access for non-ordinary load/store instructions executed by the processor to the non-cache memory, thereby bypassing the cache coherent memory. The non-ordinary load/store instruction can be a coprocessor instruction. The memory can be a low-latency type memory. The processor can include a plurality of processor cores.
摘要:
A processor for traversing deterministic finite automata (DFA) graphs with incoming packet data in real-time. The processor includes at least one processor core and a DFA module operating asynchronous to the at least one processor core for traversing at least one DFA graph stored in a non-cache memory with packet data stored in a cache-coherent memory.
摘要:
A network services processor includes an input/output bridge that avoids unnecessary updates to memory when cache blocks storing processed packet data are no longer required. The input/output bridge monitors requests to free buffers in memory received from cores and 10 units in the network services processor. Instead of writing the cache block back to the buffer in memory that will be freed, the input/output bridge issues don't write back commands to a cache controller to clear the dirty bit for the selected cache block, thus avoiding wasteful write-backs from cache to memory. After the dirty bit is cleared, the buffer in memory is freed, that is, made available for allocation to store data for another packet.
摘要:
A method and apparatus for ordering, synchronizing and scheduling work in a multi-core network services processor is provided. Each piece of work is identified by a tag that indicates how the work is to be synchronized and ordered. Throughput is increased by processing work having different tags in parallel on different processor cores. Packet processing can be broken up into different phases, each phase having a different tag dependent on ordering and synchronization constraints for the phase. A tag switch operation initiated by a core switches a tag dependent on the phase. A dedicated tag switch bus minimizes latency for the tag switch operation.