Abstract:
An addressless merge command includes an identifier of an item of data, and a reference value, but no address. A first part of the item is stored in a first place. A second part is stored in a second place. To move the first part so that the first and second parts are merged, the command is sent across a bus to a device. The device translates the identifier into a first address ADR1, and uses ADR1 to read the first part. Stored in or with the first part is a second address ADR2 indicating where the second part is stored. The device extracts ADR2, and uses ADR1 and ADR2 to issue bus commands. Each bus command causes a piece of the first part to be moved. When the entire first part has been moved, then device returns the reference value to indicate that the merge command has been completed.
Abstract:
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI addressing mode in communicating with the packet engine and in instructing the packet engine to store packet portions. A PDRSD requests a PPI from the packet engine, and is allocated a PPI by the packet engine, and then tags the packet portion to be written with the PPI and sends the packet portion and the PPI to the packet engine. Once the packet portion has been processed, a PPI de-allocation command causes the packet engine to de-allocate the PPI so that the PPI is available for allocating in association with another packet portion.
Abstract:
A Network Flow Processor (NFP) integrated circuit receives, via each of a plurality of physical MAC ports, PCP (Priority Code Point) flows. The NFP also maintains, for each of a plurality of virtual channels, a linked list of buffers. There is one port enqueue engine for each physical MAC port. For each PCP flow received via the physical MAC port associated with a port enqueue engine, the engine causes frame data of the flow to be loaded into one particular linked list of buffers. Each engine has a lookup table circuit that is configurable so that the relative priorities of the PCP flows are reordered as the PCP flows are assigned to virtual channels. A PCP flow with a higher PCP value can be assigned to a lower priority virtual channel, whereas a PCP flow with a lower PCP value can be assigned to a higher priority virtual channel.
Abstract:
A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result.
Abstract:
A multi-processor includes a shared memory that stores a search key data set including multiple search keys, a processor, a Direct Memory Access (DMA) controller, and an Interlaken Look-Aside (ILA) interface circuit. The processor generates a descriptor that is sent to the DMA controller causing the DMA controller to read the search key data set. The DMA controller selects a single search key from the set and generates a lookup command message that is communicated to the ILA interface circuit. The ILA interface circuit generates an ILA packet that includes the single search key and sends the ILA packet to an external transactional memory device that generates a result data value. The result data value is communicated back to the DMA controller via the ILA interface circuit. The DMA controller stores the result data value in the shared memory and notifies the processor that the DMA process has completed.
Abstract:
Incoming frame data is stored in a plurality of dual linked lists of buffers in a pipelined memory. The dual linked lists of buffers are maintained by a link manager. The link manager maintains, for each dual linked list of buffers, a first head pointer, a second head pointer, a first tail pointer, a second tail pointer, a head pointer active bit, and a tail pointer active bit. The first head and tail pointers are used to maintain the first linked list of the dual linked list. The second head and tail pointers are used to maintain the second linked list of the dual linked list. Due to the pipelined nature of the memory, the dual linked list system can be popped to supply dequeued values at a sustained rate of more than one value per the read access latency time of the pipelined memory.
Abstract:
A pipelined run-to-completion processor executes a conditional skip instruction. If a predicate condition as specified by a predicate code field of the skip instruction is true, then the skip instruction causes execution of a number of instructions following the skip instruction to be “skipped”. The number of instructions to be skipped is specified by a skip count field of the skip instruction. In some examples, the skip instruction includes a “flag don't touch” bit. If this bit is set, then neither the skip instruction nor any of the skipped instructions can change the values of the flags. Both the skip instruction and following instructions to be skipped are decoded one by one in sequence and pass through the processor pipeline, but the execution stage is prevented from carrying out the instruction operation of a following instruction if the predicate condition of the skip instruction was true.
Abstract:
A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.
Abstract:
A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.
Abstract:
A Network Interface Device (NID) of a web hosting server implements multiple virtual NIDs. A virtual NID is configured by configuration information in an appropriate one of a set of smaller blocks in a high-speed memory on the NID. There is a smaller block for each virtual NID. A virtual machine on the host can configure its virtual NID by writing configuration information into a larger block in PCIe address space. Circuitry on the NID detects that the PCIe write is into address space occupied by the larger blocks. If the write is into this space, then address translation circuitry converts the PCIe address into a smaller address that maps to the appropriate one of the smaller blocks associated with the virtual NID to be configured. If the PCIe write is detected not to be an access of a larger block, then the NID does not perform the address translation.