Abstract:
An egress packet modifier includes a script parser and a pipeline of processing stages. Rather than performing egress modifications using a processor that fetches and decodes and executes instructions in a classic processor fashion, and rather than storing a packet in memory and reading it out and modifying it and writing it back, the packet modifier pipeline processes the packet by passing parts of the packet through the pipeline. A processor identifies particular egress modifications to be performed by placing a script code at the beginning of the packet. The script parser then uses the code to identify a specific script of opcodes, where each opcode defines a modification. As a part passes through a stage, the stage can carry out the modification of such an opcode. As realized using current semiconductor fabrication process, the packet modifier can modify 200M packets/second at a sustained rate of up to 100 gigabits/second.
Abstract:
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address, a starting bit position, and a mask size. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs) and multiple key values from memory. Each key value is indicates a single RV to be output by the TM. A selecting circuit within the TM uses the starting bit position and mask size to select a portion of the IV. The portion of the IV is a key selector value. A key value is selected based upon the key selector value. A RV is selected based upon the key value. The key value is selected by a key selection circuit. The RV is selected by a result value selection circuit.
Abstract:
An entropy storage ring includes an input node, a plurality of serial-connected stages, and an output node. Each stage includes an XOR (or XNOR) circuit, a delay element having an input coupled to the XOR output, and a combinatorial circuit having an output coupled to a second input of the XOR. The combinatorial circuit may be a NAND, NOR, AND or OR gate. A first input of the XOR is the data input of the stage. The output of the delay element is the data output of the stage. A first input of the combinatorial circuit is coupled to receive an enable bit from a configuration register. A second input of the combinatorial circuit is coupled to the ring output node. In operation, a bit stream is supplied onto the ring input node. Feedback of multiple stages are enabled so that the bit stream undergoes complex permutation as it circulates.
Abstract:
A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
Abstract:
A network flow processor integrated circuit includes a plurality of processors, a plurality of multi-threaded transactional memories (MTMs), and a configurable mesh posted transaction data bus. The configurable mesh posted transaction data bus includes a configurable command mesh and a configurable data mesh. Each of these configurable meshes includes crossbar switches and interconnecting links. A command bus transaction value issued by a processor can pass across the command mesh to an MTM. The command bus transaction bus value includes a reference value. The MTM uses the reference value to pull data across the configurable data mesh into the MTM. The MTM then uses the data to carry out the commanded transactional memory operation. Multiple such commands can pass across the posted transaction bus across different parts of the integrated circuit at the same time, and a single MTM can be carrying out multiple such operations at the same time.
Abstract:
A Network Interface Device (NID) of a web hosting server implements multiple virtual NIDs. For each virtual NID there is a block in a memory of a transactional memory on the NID. This block stores configuration information that configures the corresponding virtual NID. The NID also has a single managing processor that monitors configuration of the plurality of virtual NIDs. If there is a write into the memory space where the configuration information for the virtual NIDs is stored, then the transactional memory detects this write and in response sends an alert to the managing processor. The size and location of the memory space in the memory for which write alerts are to be generated is programmable. The content and destination of the alert is also programmable.
Abstract:
An integrated circuit includes a processor and an exact-match flow table structure. A first packet is received onto the integrated circuit. The packet is determined to be of a first type. As a result of this determination, execution by the processor of a first sequence of instructions is initiated. This execution causes bits of the first packet to be concatenated and modified in a first way, thereby generating a first Flow Id. The first Flow Id is an exact-match for the Flow Id of a first stored flow entry. A second packet is received. It is of a first type. As a result, a second sequence of instructions is executed. This causes bits of the second packet to be concatenated and modified in a second way, thereby generating a second Flow Id. The second Flow Id is an exact-match for the Flow Id of a second stored flow entry.
Abstract:
A chained Command/Push/Pull (CPP) bus command is output by a first device and is sent from a CPP bus master interface across a set of command conductors of a CPP bus to a second device. The chained CPP command includes a reference value. The second device decodes the command, in response determines a plurality of CPP commands, and outputs the plurality of CPP commands onto the CPP bus. The second device detects when the plurality of CPP commands have been completed, and in response returns the reference value back to the CPP bus master interface of the first device via a set of data conductors of the CPP bus. The reference value indicates to the first device that an overall operation of the chained CPP command has been completed.
Abstract:
A pipelined run-to-completion processor executes a conditional skip instruction. If a predicate condition as specified by a predicate code field of the skip instruction is true, then the skip instruction causes execution of a number of instructions following the skip instruction to be “skipped”. The number of instructions to be skipped is specified by a skip count field of the skip instruction. In some examples, the skip instruction includes a “flag don't touch” bit. If this bit is set, then neither the skip instruction nor any of the skipped instructions can change the values of the flags. Both the skip instruction and following instructions to be skipped are decoded one by one in sequence and pass through the processor pipeline, but the execution stage is prevented from carrying out the instruction operation of a following instruction if the predicate condition of the skip instruction was true.
Abstract:
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing and handling the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI (Packet Portion Identifier) Addressing Mode (PAM) in communicating with the packet engine and in instructing the packet engine to store packet portions. The packet engine uses linear memory addressing to write the packet portions into the memory, and to read the packet portions from the memory.