Abstract:
A transactional memory (TM) includes a control circuit pipeline and an associated memory unit. The memory unit stores a plurality of rings. The pipeline maintains, for each ring, a head pointer and a tail pointer. A ring operation stage of the pipeline maintains the pointers as values are put onto and are taken off the rings. A put command causes the TM to put a value into a ring, provided the ring is not full. A get command causes the TM to take a value off a ring, provided the ring is not empty. A put with low priority command causes the TM to put a value into a ring, provided the ring has at least a predetermined amount of free buffer space. A get from a set of rings command causes the TM to get a value from the highest priority non-empty ring (of a specified set of rings).
Abstract:
In response to receiving a “Return Available PPI Credits” command from a credit-aware (CA) device, a packet engine sends a “Credit To Be Returned” (CTBR) value it maintains for that device back to the CA device, and zeroes out its stored CTBR value. The CA device adds the credits returned to a “Credits Available” value it maintains. The CA device uses the “Credits Available” value to determine whether it can issue a PPI allocation request. The “Return Available PPI Credits” command does not result in any PPI allocation or de-allocation. In another aspect, the CA device issues one PPI allocation request to the packet engine when its recorded “Credits Available” value is zero or negative. If the PPI allocation request cannot be granted, then it is buffered in the packet engine, and is resubmitted within the packet engine, until the packet engine makes the PPI allocation.
Abstract:
Circuitry to provide in-order packet delivery. A packet descriptor including a sequence number is received. It is determined in which of three ranges the sequence number resides. Depending, at least in part, on the range in which the sequence number resides it is determined if the packet descriptor is to be communicated to a scheduler which causes an associated packet to be transmitted. If the sequence number resides in a first “flush” range, all associated packet descriptors are output. If the sequence number resides in a second “send” range, only the received packet descriptor is output. If the sequence number resides in a third “store and reorder” range and the sequence number is the next in-order sequence number the packet descriptor is output; if the sequence number is not the next in-order sequence number the packet descriptor is stored in a buffer and a corresponding valid bit is set.
Abstract:
A transactional memory receives a command, where the command includes an address and a novel DAT (Do Address Translation) bit. If the DAT bit is set and if the transactional memory is enabled to do address translations and if the command is for an access (read or write) of a memory of the transactional memory, then the transactional memory performs an address translation operation on the address of the command. Parameters of the address translation are programmable and are set up before the command is received. In one configuration, certain bits of the incoming address are deleted, and other bits are shifted in bit position, and a base address is ORed in, and a padding bit is added, thereby generating the translated address. The resulting translated address is then used to access the memory of the transactional memory to carry out the command.
Abstract:
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each processor has an intelligent packet data register file. One processor is tasked with processing the packet, and its packet data register file caches a subset of the bytes. If the register file detects a packet data prefetch trigger condition, and it does not store some of the bytes in a prefetch window, then it prefetches the bytes before such bytes are required in the execution of a subsequent instruction. The processor has instructions that configure the prefetching, that enable such prefetching, and that disable such prefetching in certain ways.
Abstract:
A pipelined run-to-completion processor can decode three instructions in three consecutive clock cycles, and can also execute the instructions in three consecutive clock cycles. The first instruction causes the ALU to generate a value which is then loaded due to execution of the first instruction into a register of a register file. The second instruction accesses the register and loads the value into predicate bits in a register file read stage. The predicate bits are loaded in the very next clock cycle following the clock cycle in which the second instruction was decoded. The third instruction is a conditional instruction that uses the values of the predicate bits as a predicate code to determine a predicate function. If a predicate condition (as determined by the predicate function as applied to flags) is true then an instruction operation of the third instruction is carried out, otherwise it is not carried out.
Abstract:
A remote processor interacts with a transactional memory that has a memory, local BWC (Byte-Wise Compare) resources, and local NFA (Non-deterministic Finite Automaton) engine resources. The processor causes a byte stream to be transferred into the transactional memory and into the memory. The processor then uses the BWC circuit to find a character signature in the byte stream. The processor obtains information about the character signature from the BWC circuit, and based on the information uses the NFA engine to process the byte stream starting at a byte position determined based at least in part on the results of the BWC circuit. From the time the byte stream is initially written into the transactional memory until the time the NFA engine completes, the byte stream is not read out of the transactional memory.
Abstract:
A multi-processor includes a pool of processors and a common packet buffer memory. Bytes of packet data of a packet are stored in the packet buffer memory. Each processor has an intelligent packet data register file. One processor is tasked with processing the packet, and its packet data register file caches a subset of the bytes. Some instructions when executed require that the packet data register file supply the execute stage of the processor with certain bytes of the packet data. The register file detects a packet data prefetch trigger condition, and in response determines if it does not store some of the bytes in a prefetch window. If it does not, then it retrieves those bytes from the packet buffer memory, so that it then has all the bytes in the prefetch window. In one example, a subsequently executed instruction uses the prefetched packet data.
Abstract:
A transactional memory (TM) receives a lookup command across a bus from a processor. The command includes a memory address, a starting bit position, and a mask size. In response to the command, the TM pulls an input value (IV). The memory address is used to read a word containing multiple result values (RVs) and multiple key values from memory. Each key value indicates a single RV to be output by the TM. A selecting circuit within the TM uses the starting bit position and mask size to select a portion of the IV. The portion of the IV is a key selector value. A key value is selected based upon the key selector value. A RV is selected based upon the key value. The key value is selected by a key selection circuit. The RV is selected by a result value selection circuit.
Abstract:
An addressless merge command includes an identifier of an item of data, and a reference value, but no address. A first part of the item is stored in a first place. A second part is stored in a second place. To move the first part so that the first and second parts are merged, the command is sent across a bus to a device. The device translates the identifier into a first address ADR1, and uses ADR1 to read the first part. Stored in or with the first part is a second address ADR2 indicating where the second part is stored. The device extracts ADR2, and uses ADR1 and ADR2 to issue bus commands. Each bus command causes a piece of the first part to be moved. When the entire first part has been moved, then device returns the reference value to indicate that the merge command has been completed.