Abstract:
A pipelined run-to-completion processor has a special tripwire bus port and executes a novel tripwire instruction. Execution of the tripwire instruction causes the processor to output a tripwire value onto the port during a clock cycle when the tripwire instruction is being executed. A first multi-bit value of the tripwire value is data that is output from registers, and/or flags, and/or pointers, and/or data values stored in the pipeline. A field of the tripwire instruction specifies what particular stored values will be output as the first multi-bit value. A second multi-bit value of the tripwire value is a number that identifies the particular processor that output the tripwire value. The processor has a TE enable/disable control bit. This bit is programmable by a special instruction to disable all tripwire instructions. If disabled, a tripwire instruction is fetched and decoded but does not cause the output of a tripwire value.
Abstract:
A general purpose PicoEngine Multi-Processor (PEMP) includes a hierarchically organized pool of small specialized picoengine processors and associated memories. A stream of data input values is received onto the PEMP. Each input data value is characterized, and from the characterization a task is determined. Picoengines are selected in a sequence. When the next picoengine in the sequence is available, it is then given the input data value along with an associated task assignment. The picoengine then performs the task. An output picoengine selector selects picoengines in the same sequence. If the next picoengine indicates that it has completed its assigned task, then the output value from the selected picoengine is output from the PEMP. By changing the sequence used, more or less of the processing power and memory resources of the pool is brought to bear on the incoming data stream. The PEMP automatically disables unused picoengines and memories.
Abstract:
An island-based network flow processor (IB-NFP) integrated circuit includes rectangular islands disposed in rows. In one example, the configurable mesh data bus is configurable to form a command/push/pull data bus over which multiple transactions can occur simultaneously on different parts of the integrated circuit. The rectangular islands of one row are oriented in staggered relation with respect to the rectangular islands of the next row. The left and right edges of islands in a row align with left and right edges of islands two rows down in the row structure. The data bus involves multiple meshes. In each mesh, the island has a centrally located crossbar switch and six radiating half links, and half links down to functional circuitry of the island. The staggered orientation of the islands, and the structure of the half links, allows half links of adjacent islands to align with one another.
Abstract:
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs), multiple reference values, and multiple prefix values from memory. A selecting circuit within the TM uses a starting bit position and a mask size to select a portion of the IV. The portion of the IV is a lookup key value (LKV). Mask values are generated based on the prefix values. The LKV is masked by each mask value thereby generating multiple masked values that are compared to the reference values. Based on the comparison a lookup table generates a selector value that is used to select a result value. The selected result value is then communicated to the processor via the bus.
Abstract:
A chained Command/Push/Pull (CPP) bus command is output by a first device and is sent from a CPP bus master interface across a set of command conductors of a CPP bus to a second device. The chained CPP command includes a reference value. The second device decodes the command, in response determines a plurality of CPP commands, and outputs the plurality of CPP commands onto the CPP bus. The second device detects when the plurality of CPP commands have been completed, and in response returns the reference value back to the CPP bus master interface of the first device via a set of data conductors of the CPP bus. The reference value indicates to the first device that an overall operation of the chained CPP command has been completed.
Abstract:
Within a networking device, packet portions from multiple PDRSDs (Packet Data Receiving and Splitting Devices) are loaded into a single memory, so that the packet portions can later be processed by a processing device. Rather than the PDRSDs managing and handling the storing of packet portions into the memory, a packet engine is provided. The PDRSDs use a PPI (Packet Portion Identifier) Addressing Mode (PAM) in communicating with the packet engine and in instructing the packet engine to store packet portions. A PDRSD requests a PPI from the packet engine in a PPI allocation request, and is allocated a PPI by the packet engine in a PPI allocation response, and then tags the packet portion to be written with the PPI and sends the packet portion and the PPI to the packet engine.
Abstract:
In response to receiving a novel “Return Available PPI Credits” command from a credit-aware device, a packet engine sends a “Credit To Be Returned” (CTBR) value it maintains for that device back to the credit-aware device, and zeroes out its stored CTBR value. The credit-aware device adds the credits returned to a “Credits Available” value it maintains. The credit-aware device uses the “Credits Available” value to determine whether it can issue a PPI allocation request. The “Return Available PPI Credits” command does not result in any PPI allocation or de-allocation. In another novel aspect, the credit-aware device is permitted to issue one PPI allocation request to the packet engine when its recorded “Credits Available” value is zero or negative. If the PPI allocation request cannot be granted, then it is buffered in the packet engine, and is resubmitted within the packet engine, until the packet engine makes the PPI allocation.
Abstract:
A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result.
Abstract:
A novel declare instruction can be used in source code to declare a sub-pool of resource instances to be taken from the resource instances of a larger declared pool. Using such declare instructions, a hierarchy of pools and sub-pools can be declared. A novel allocate instruction can then be used in the source code to instruct a novel linker to make resource instance allocations from a desired pool or a desired sub-pool of the hierarchy. After compilation, the declare and allocate instructions appear in the object code. The linker uses the declare and allocate instructions in the object code to set up the hierarchy of pools and to make the indicated allocations of resource instances to symbols. After resource allocation, the linker replaces instances of a symbol in the object code with the address of the allocated resource instance, thereby generating executable code.
Abstract:
A lookup engine of a transactional memory (TM) has multiple hardware lookup structures, each usable to perform a different type of lookup. In response to a lookup command, the lookup engine reads a first block of first information from a memory unit. The first information configures the lookup engine to perform a first type of lookup, thereby identifying a first result value. If the first result value is not a final result value, then the lookup engine uses address information in the first result value to read a second block of second information. The second information configures the lookup engine to perform a second type of lookup, thereby identifying a second result value. This process repeats until a final result value is obtained. The type of lookup performed is determined by the result value of the preceding lookup and/or type information of the block of information for the next lookup.