摘要:
Systems and methods for adaptively mapping system memory address bits into an instruction tag and an index into the cache are disclosed. More particularly, hardware and software are disclosed for observing collisions that occur for a given mapping of system memory bits into a tag and an index. Based on the observations, an optimal mapping may be determined that minimizes collisions.
摘要:
Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution.
摘要:
Systems and methods for scheduling data packets in a network processor are disclosed. Embodiments provide a network processor that comprises a best-effort scheduler with a minimal calendar structure for addressing schedule control blocks. In one embodiment, a four-entry calendar structure provides for rate-limited weighted best effort scheduling. Each of a plurality of different flows has associated schedule control blocks. Schedule control blocks are stored as linked lists in a last-in-first-out buffer. Each calendar entry is associated with a different linked list by storing in the calendar entry the address of the first-out schedule control block in the linked list. Each schedule control block has a counter and is assigned a rate limit according to the bandwidth priority of the flow to which the corresponding packet belongs. Each time a schedule control block is accessed from a last-in-first-out buffer storing the linked list, the scheduler generates a scheduling event and the counter of the schedule control block is incremented. When an incremented counter of a schedule control block equals its rate limit, the schedule control block is temporarily removed from further scheduling until a time interval concludes.
摘要:
Systems and methods for scheduling data packets in a network processor are disclosed. Embodiments provide a network processor that comprises a best-effort scheduler with a minimal calendar structure for addressing schedule control blocks. In one embodiment, a three-entry calendar structure provides for weighted best effort scheduling. Each of a plurality different flows has an associated schedule control block. Schedule control blocks are stored as linked lists in a last-in-first-out buffer. Each calendar entry is associated with a different linked list by storing in the calendar entry the address of the first-out schedule control block in the linked list. Each schedule control block has a counter and is assigned a weight according to the bandwidth priority of the flow to which the corresponding packet belongs. Each time a schedule control block is accessed from a last-in-first-out buffer storing the linked list, the scheduler generates a scheduling event and the counter of the schedule control block is incremented. When an incremented counter of a schedule control block equals its weight, the schedule control block is temporarily removed from further scheduling.
摘要:
Systems and methods for implementing multi-frame control blocks in a network processor are disclosed. Embodiments include systems and methods to reduce long latency memory access to less expensive memory such as DRAM. As a network processor in a network receives packets of data, the network processor forms a frame control block for each packet. The frame control block contains a pointer to a memory location where the packet data is stored, and is thereby associated with the packet. The network processor associates a plurality of frame control blocks together in a table control block that is stored in a control store. Each table control block comprises a pointer to a memory location of a next table control block in a chain of table control blocks. Because frame control blocks are stored and accessed in table control blocks, less frequent memory accesses may be needed to keep up with the frame rate of packet transmission.
摘要:
Systems and methods for implementing multi-frame control blocks in a network processor are disclosed. Embodiments include systems and methods to reduce long latency memory access to less expensive memory such as DRAM. As a network processor in a network receives packets of data, the network processor forms a frame control block for each packet. The frame control block contains a pointer to a memory location where the packet data is stored, and is thereby associated with the packet. The network processor associates a plurality of frame control blocks together in a table control block that is stored in a control store. Each table control block comprises a pointer to a memory location of a next table control block in a chain of table control blocks. Because frame control blocks are stored and accessed in table control blocks, less frequent memory accesses may be needed to keep up with the frame rate of packet transmission.
摘要:
Systems and methods for implementing multi-frame control blocks in a network processor are disclosed. Embodiments include systems and methods to reduce long latency memory access to less expensive memory such as DRAM. As a network processor in a network receives packets of data, the network processor forms a frame control block for each packet. The frame control block contains a pointer to a memory location where the packet data is stored, and is thereby associated with the packet. The network processor associates a plurality of frame control blocks together in a table control block that is stored in a control store. Each table control block comprises a pointer to a memory location of a next table control block in a chain of table control blocks. Because frame control blocks are stored and accessed in table control blocks, less frequent memory accesses may be needed to keep up with the frame rate of packet transmission.
摘要:
A mechanism is provided for merging in a network processor results from a parser and results from an external coprocessor providing processing support requested by said parser. The mechanism enqueues in a result queue both parser results needing to be merged with a coprocessor result and parser results which have no need to be merged with a coprocessor result. An additional queue is used to enqueue the addresses of the result queue where the parser results are stored. The result from the coprocessor is received in a simple response register. The coprocessor result is read by the result queue management logic from the response register and merged to the corresponding incomplete parser result read in the result queue at the address enqueued in the additional queue.
摘要:
A host Ethernet adapter (HEA) and method of managing network communications is provided. The HEA includes a host interface configured for communication with a multi-core processor over a processor bus. The host interface comprises a receive processing element including a receive processor, a receive buffer and a scheduler for dispatching packets from the receive buffer to the receive processor; a send processing element including a send processor and a send buffer; and a completion queue scheduler (CQS) for dispatching completion queue elements (CQE) from the head of the completion queue (CQ) to threads of the multi-core processor in a network node mode. The method comprises operatively coupling an Ethernet adapter to a multi-core processor system via a processor bus, selectively assigning a first plurality of packets to a first queue pair for servicing in an endpoint mode, running a device driver on the multi-core processing system, the device driver controlling the servicing of the first queue pair by dispatching the first plurality of packets to only one processor core of the multi-core processor system, selectively assigning a second plurality of packets to a second queue pair for servicing in a network node mode; and the Ethernet adapter controlling the servicing of the second queue pair by dispatching the second plurality of packets to multiple processor threads.
摘要:
Mechanisms for processing of communications between data processing devices are provided. With the mechanisms of the illustrative embodiments, a set of techniques that enables sustaining media speed by distributing transmit and receive-side processing over multiple processing cores is provided. In addition, these techniques also enable designing multi-threaded network interface controller (NIC) hardware that efficiently hides the latency of direct memory access (DMA) operations associated with data packet transfers over an input/output (I/O) bus. Multiple processing cores may operate concurrently using separate instances of a communication protocol stack and device drivers to process data packets for transmission with separate hardware implemented send queue managers in a network adapter processing these data packets for transmission. Multiple hardware receive packet processors in the network adapter may be used, along with a flow classification engine, to route received data packets to appropriate receive queues and processing cores for processing.