Abstract:
A high-speed credit-based allocator circuit receives an allocation request to make an allocation to one of a set of a processing entities. The allocator circuit maintains a chain of bubble sorting module circuits for the set, where each bubble sorting module circuit stores a resource value and an indication of a corresponding processing entity. A bubble sorting operation is performed so that the head of the chain tends to indicate the processing entity of the set that has the highest amount of the resource (credit) available. The allocation requested is made to the processing entity indicated by the head module circuit of the chain. The amount of the resource available to each processing entity is tracked by adjusting the resource values as allocations are made, and as allocated tasks are completed. The allocator circuit is configurable to maintain multiple chains, thereby supporting credit-based allocations to multiple sets of processing entities.
Abstract:
An overflow threshold value is stored for each of a plurality of virtual channels. A link manager maintains, for each virtual channel, a buffer count. If the buffer count for a virtual channel is detected to exceed the overflow threshold value for a virtual channel whose originating PCP flows were merged, then a PFC (Priority Flow Control) pause frame is generated where multiple ones of the priority class enable bits are set to indicate that multiple PCP flows should be paused. For the particular virtual channel that is overloaded, an Inverse PCP Remap LUT (IPRLUT) circuit performs inverse PCP mapping, including merging and/or reordering mapping, and outputs an indication of each of those PCP flows that is associated with the overloaded virtual channel. Associated physical MAC port circuitry uses this information to generate the PFC pause frame so that the appropriate multiple enable bits are set in the pause frame.
Abstract:
A reconfigurable, scalable and flexible island-based network flow processor integrated circuit architecture includes a plurality of rectangular islands of identical shape and size. The islands are disposed in rows, and a configurable mesh command/push/pull data bus extends through all the islands. The integrated circuit includes first SerDes I/O blocks, an ingress MAC island that converts incoming symbols into packets, an ingress NBI island that analyzes packets and generates ingress packet descriptors, a microengine (ME) island that receives ingress packet descriptors and headers from the ingress NBI and analyzes the headers, a memory unit (MU) island that receives payloads from the ingress NBI and performs lookup operations and stores payloads, an egress NBI island that receives the header portions and the payload portions and egress descriptors and performs egress scheduling, and an egress MAC island that outputs packets to second SerDes I/O blocks.
Abstract translation:可重构,可扩展和灵活的基于岛的网络流处理器集成电路架构包括多个相同形状和大小的矩形岛。 岛排列成行,并且可配置的网格命令/推/拉数据总线延伸穿过所有岛。 该集成电路包括第一个SerDes I / O块,一个将输入符号转换成数据包的入口MAC岛,一个分析数据包并产生入口包描述符的入口NBI岛,一个微型引擎(ME)岛,接收入口数据包描述符和头 入口NBI并分析头部,存储单元(MU)岛,其从入口NBI接收有效载荷并执行查找操作并存储有效载荷;接收标题部分和有效载荷部分和出口描述符并执行出口调度的出口NBI岛, 以及向第二SerDes I / O块输出数据包的出口MAC岛。
Abstract:
An integrated circuit receives a DDR (Double Data Rate) data signal and an associated DDR clock signal, and communicates those signals from integrated circuit input terminals a substantial distance across the integrated circuit to a subcircuit that then receives and uses the DDR data. Within the integrated circuit, a DDR retiming circuit receives the DDR data signal and the associated DDR clock signal from the terminals. The DDR retiming circuit splits the DDR data signal into two components, and then transmits those two components over the substantial distance toward the subcircuit. The subcircuit then recombines the two components back into a single DDR data signal and supplies the DDR data signal and the DDR clock signal to the subcircuit. The DDR data signal and the DDR clock signal are supplied to the subcircuit in such a way that setup and hold time requirements of the subcircuit are met.
Abstract:
A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.
Abstract:
A general purpose PicoEngine Multi-Processor (PEMP) includes a hierarchically organized pool of small specialized picoengine processors and associated memories. A stream of data input values is received onto the PEMP. Each input data value is characterized, and from the characterization a task is determined. Picoengines are selected in a sequence. When the next picoengine in the sequence is available, it is then given the input data value along with an associated task assignment. The picoengine then performs the task. An output picoengine selector selects picoengines in the same sequence. If the next picoengine indicates that it has completed its assigned task, then the output value from the selected picoengine is output from the PEMP. By changing the sequence used, more or less of the processing power and memory resources of the pool is brought to bear on the incoming data stream. The PEMP automatically disables unused picoengines and memories.
Abstract:
A dispatcher circuit receives sets of instructions from an instructing entity. Instructions of the set of a first type are put into a first queue circuit, instructions of the set of a second type are put into a second queue circuit, and so forth. The first queue circuit dispatches instructions of the first type to one or more processing engines and records when the instructions of the set are completed. When all the instructions of the set of the first type have been completed, then the first queue circuit sends the second queue circuit a go signal, which causes the second queue circuit to dispatch instructions of the second type and to record when they have been completed. This process proceeds from queue circuit to queue circuit. When all the instructions of the set have been completed, then the dispatcher circuit returns an “instructions done” to the original instructing entity.
Abstract:
A transactional memory receives a command, where the command includes an address and a novel DAT (Do Address Translation) bit. If the DAT bit is set and if the transactional memory is enabled to do address translations and if the command is for an access (read or write) of a memory of the transactional memory, then the transactional memory performs an address translation operation on the address of the command. Parameters of the address translation are programmable and are set up before the command is received. In one configuration, certain bits of the incoming address are deleted, and other bits are shifted in bit position, and a base address is ORed in, and a padding bit is added, thereby generating the translated address. The resulting translated address is then used to access the memory of the transactional memory to carry out the command.
Abstract:
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address, a starting bit position, and a mask size. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs) and multiple threshold values (TVs) from memory. A selecting circuit within the TM uses the starting bit position and mask size to select a portion of the IV. The portion of the IV is a lookup key value (LKV). The multiple TVs define multiple lookup key ranges. The TM determines which lookup key range includes the LKV. A RV is selected based upon the lookup key range determined to include the LKV. The lookup key range is determined by a lookup key range identifier circuit. The selected RV is selected by a result value selection circuit.
Abstract:
An automaton hardware engine employs a transition table organized into 2n rows, where each row comprises a plurality of n-bit storage locations, and where each storage location can store at most one n-bit entry value. Each row corresponds to an automaton state. In one example, at least two NFAs are encoded into the table. The first NFA is indexed into the rows of the transition table in a first way, and the second NFA is indexed in to the rows of the transition table in a second way. Due to this indexing, all rows are usable to store entry values that point to other rows.