摘要:
A compute near memory binary neural network accelerator with digital circuits that achieves energy efficiencies comparable to or surpassing a compute near memory binary neural network accelerator with analog circuits is provided. The compute near memory binary neural network accelerator with digital circuits is more process scalable, robust to process, voltage and temperature variations, and immune to circuit noise.
摘要:
Techniques are provided for implementing a hybrid processing architecture comprising a general-purpose processor (CPU) coupled to an analog in-memory artificial intelligence (AI) processor. A hybrid processor implementing the techniques according to an embodiment includes an AI processor configured to perform analog in-memory computations based on neural network (NN) weighting factors and input data provided by the CPU. The AI processor includes one or more NN layers. The NN layers include digital access circuits to receive data and weighting factors and to provide computational results. The NN layers also include memory circuits to store data and weights, and further include bit line processors and cross bit line processors to perform analog dot product computations between columns of the data memory circuits and the weight factor memory circuits. Some of the NN layers are configured as convolutional NN layers and others are configured as fully connected NN layers, according to some embodiments.
摘要:
Disclosed is a system and device and related methods for data manipulation, especially for SIMD operations such as permute, shift, and rotate. An apparatus includes a permute section that repositions data on sub-word boundaries and a shift section that repositions the data distances smaller than the sub-word width. The sub-word width is configurable and selectable, and the permute section and shift section may operate on different boundary widths. In a first stage, the permute section repositions the data at the nearest sub-word boundary and, in a second stage, the shift section repositions the data to its final desired position. The shift section includes multi-stages set in a logarithmic cascade relationship. Additionally, each shifter within each of the multi-stages is highly connected, allowing fast and precise data movements.
摘要:
Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.
摘要:
Techniques and mechanisms for performing circuit-switched routing and packet-switched routing for network communication. In an embodiment, a router evaluates control information of a packet received by the router, the evaluation to detect whether the packet includes data for a sideband communication. Based on the evaluation, the router performs a selection from among a plurality of modes of the router, the plurality of modes including a first mode to route the packet for packet-switched communication of sideband data in a network. The plurality of modes also includes a second mode to configure a circuit-switched channel according to the packet. In another embodiment, the router determines a direction for routing a packet in a hierarchical network, wherein the determining of the direction is based on a level of the router in a hierarchy of the hierarchical network.