Abstract:
A circuit for controlling the operation of a memory system having different types of memory is described. The circuit comprises a first memory having a first type of memory element and having a first access time; a second memory having a second type of memory element and having a second access time, wherein the second type of memory element is different than the first type of memory element; a memory control circuit enabling access to the first memory and the second memory; a delay buffer coupled to the second memory to compensate for a difference in the first access time and the second access time; and a circuit for merging outputs of the first memory and delayed outputs of the second memory to generate ordered output data. A method of controlling the operation of a memory system is also disclosed.
Abstract:
Examples herein propose operating redundant ML models which have been trained using a boosting technique that considers hardware faults. The embodiments herein describe performing an evaluation process where the performance of a first ML model is measured in the presence of a hardware fault. The errors introduced by the hardware fault can then be used to train a second ML model. In one embodiment, a second evaluation process is performed where the combined performance of both the first and second trained ML models is measured in the presence of a hardware fault. The resulting errors can then be used when training a third ML model. In this manner, the three trained ML models are trained to be error aware. As a result, during operation, if a hardware fault occurs, the three ML models have better performance relative to three ML models that where not trained to be error aware.
Abstract:
A neural network system includes an input layer, one or more hidden layers, and an output layer. The input layer receives a training set including a sequence of batches and provides to its following layer output activations associated with the sequence of batches respectively. A first hidden layer receives, from its preceding layer, a first input activation associated with a first batch, receive a first input gradient associated with a second batch preceding the first batch, and provide, to its following layer a first output activation associated with the first batch based on the first input activation and first input gradient. The first and second batches have a delay factor associated with at least two batches. The output layer receives, from its preceding layer, a second input activation, and provide, to its preceding layer, a first output gradient based on the second input activation and the first training set.
Abstract:
In an example, an integrated circuit (IC) includes a receive circuit, a transmit circuit, and a control circuit. The receive circuit includes a receive data path and a receive control interface, the receive data path coupled to store received transmission control protocol (TCP) data for a plurality of TCP sessions in a respective plurality of receive buffers in an external memory circuit external to the IC. The transmit circuit includes a transmit data path and a transmit control interface, the transmit data path coupled to read TCP data to be transmitted for the plurality of TCP sessions from a respective plurality of transmit buffers in the external memory circuit. The control circuit is coupled to the receive control interface and the transmit control interface, the control circuit configured to maintain data structures to maintain TCP state information for the plurality of TCP sessions.
Abstract:
A method of processing data in an integrated circuit is described. The method comprises establishing a pipeline of processing blocks, wherein each processing block has a different function; coupling a data packet having data and meta-data to an input of the pipeline of processing blocks; and processing the data of the data packet using predetermined processing blocks based upon the meta-data. A device for processing data in an integrated circuit is also described.
Abstract:
A programmable IC includes a plurality of programmable resources, a plurality of shareable logic circuits coupled to the plurality of programmable resources, and a virtualization circuit. The plurality of programmable resources includes programmable logic circuits and programmable routing resources. The virtualization circuit is configured to manage sharing of the plurality of shareable logic circuits between a plurality of user designs implemented in the plurality of programmable resources. The user designs are communicatively isolated from one another on the programmable IC.
Abstract:
A neural network system includes an input layer, one or more hidden layers, and an output layer. A first layer circuit implements a first layer of the one or more hidden layers. The first layer includes a first weight space including one or more subgroups. A forward path circuit of the first layer circuit includes a multiply and accumulate circuit to receive an input from a layer preceding the first layer; and provide a first subgroup weighted sum using the input and a first plurality weights associated with a first subgroup. A scaling coefficient circuit provides a first scaling coefficient associated with the first subgroup, and applies the first scaling coefficient to the first subgroup weighted sum to generate a first subgroup scaled weighted sum. An activation circuit generates an activation based on the first subgroup scaled weighted sum and provide the activation to a layer following the first layer.
Abstract:
The coherent accelerator processor interface (CAPI) provides a high-performance when using heterogeneous compute architectures, but CAPI is not compatible with the advanced extensible interface (AXI) which is used by many accelerators. The examples herein describe an AXI-CAPI adapter (e.g., a hardware architecture) that converts AXI signals to CAPI signals and vice versus. In one example, the AXI-CAPI adapter includes four modules: a low-level shim, a high-level shim, an AXI full module, and an AXI Lite module which are organized in a hierarchy of hardware elements. Each of the modules outputs can output a different version of the AXI signals using the hierarchical structure.
Abstract:
In an example, a circuit of a neural network implemented in an integrated circuit (IC) includes a layer of hardware neurons, the layer including a plurality of inputs, a plurality of outputs, a plurality of weights, and a plurality of threshold values, each of the hardware neurons including: a logic circuit having inputs that receive first logic signals from at least a portion of the plurality of inputs and outputs that supply second logic signals corresponding to an exclusive NOR (XNOR) of the first logic signals and at least a portion of the plurality of weights; a counter circuit having inputs that receive the second logic signals and an output that supplies a count signal indicative of the number of the second logic signals having a predefined logic state; and a compare circuit having an input that receives the count signal and an output that supplies a logic signal having a logic state indicative of a comparison between the count signal and a threshold value of the plurality of threshold values; wherein the logic signal output by the compare circuit of each of the hardware neurons is provided as a respective one of the plurality of outputs.
Abstract:
A circuit for processing data is described. The circuit comprises an input for receiving a request for implementing a key-value store data transaction; a plurality of memory interfaces associated with different memory types enabling access to a plurality of memory devices associated with a key-value store; and a memory management circuit controlling the routing of data by way of the plurality of memory interfaces based upon a data transfer criterion.