摘要:
A method for multi-bit de-skewing of parallel bus signals is disclosed. The method includes receiving data comprising a multi-bit word and a training pattern. After a first control word of the training pattern is detected, the number of bits needed to de-skew each data bit of a multi-bit data word in each bit-line of a parallel bus is calculated. The number of bits needed to de-skew each data bit of a multi-bit data word in each bit-line of the parallel bus is transmitted to a bit delay line. The system then outputs a de-skewed data word.
摘要:
Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.
摘要:
Described is an apparatus that comprises: a first sequential unit; a first queue coupled in parallel to the first sequential unit such that the first queue and first sequential unit receive a first input, the first sequential for double sampling the first input; a compare unit to receive an output from the first sequential unit; and a first selection unit controllable by a write pointer of a previous cycle, the first selection unit to receive outputs of each storage unit of the first queue, wherein the first selection unit to generate an output for comparison by the first compare unit.
摘要:
Systems and methods for performing encoding and/or decoding can include an input data path that receives multiple input data values having an order (significance) with respect to one another. Each input data value can be applied to multiple compute paths (106-1 to 106-N), each of which can precompute multiple output values based on a different predetermined disparity value. Multiplexers (114-1 to 114-N) can output one precomputed output value according to a disparity value corresponding to a previous input data value in the order.
摘要:
A circuit configured to provide a storage device comprising one or more virtual multiqueue FIFOs. The circuit is generally configured to operate at a preferred clock speed of a plurality of clock speeds.
摘要:
Techniques for resilient communication. A data path stores data to be transmitted over a link to a receiving node. An output stage is coupled between the data path and the link. The output stage includes double sampling mechanisms to preserve a copy of data transmitted over the link to the receiving node. Error detection circuitry is coupled with the output stage to detect transient timing errors in the data path or output stage. The error detection circuitry causes the output stage to send the copy of the data transmitted over the link in response to detecting an error.
摘要:
An apparatus configured to extract in-band information or skip extraction of the in-band information and perform a look ahead operation. The apparatus may be configured to switch between the extraction and the skipping of the extraction.
摘要:
A method, system and processor are provided for minimizing latency and loss of processor bandwidth in a pipelined processor when responding to an interrupt. The method advantageously avoids emptying and refilling the processor's instruction pipeline in order to service an interrupt request. Instead, a short sequence of instructions comprising the interrupt response is inserted into the pipeline. Normal pipeline operation stalls while the inserted instructions execute, but since flow is not disrupted the loss in bandwidth is not as great as if the pipeline were flushed. Furthermore, direct insertion of the instructions into the pipeline avoids the need for the processor to save its context and branch to an interrupt service routine in memory; this results in much faster response in servicing the interrupt, thereby reducing latency. In the preferred embodiment, the method applies to a pipelined processor having a RISC (Reduced Instruction Set Computer) architecture, which receives interrupt requests from one or more DMA memory controllers. The instructions inserted into the pipeline compute block address information for a DMA transfer. A system and processor implementing the method are disclosed, based on an enhancement of a conventional RISC processor design, and making use of registers and other existing logic resources within the processor. It is shown that the enhanced processor can respond to DMA interrupts with shorter latency and a smaller reduction in processor bandwidth than if conventional interrupt handling were used.
摘要:
A method for writing and reading in-band information to and from a single storage element, comprising the steps of (A) receiving the in-band information, (B) storing data in either (i) a port information register when in a first state or (ii) a memory element when in a second state and (C) storing subsequent data in the memory element. The first state and the second state may be dependent upon a block position of the in-band information.
摘要:
Technologies for memory management of a neural network include a compute device to read a memory of the compute device to access connectivity data associated with a neuron of the neural network, determine a memory address at which weights corresponding with the one or more network connections are stored, and access the corresponding weights from a memory location corresponding with the memory address. The connectivity data is indicative of one or more network connections from the neuron.