Abstract:
A method of performing convolution in a neural network with variable dilation rate is provided. The method includes receiving a size of a first kernel and a dilation rate, determining at least one of size of one or more disintegrated kernels based on the size of the first kernel, a baseline architecture of a memory and the dilation rate, determining an address of one or more blocks of an input image based on the dilation rate, and one or more parameters associated with a size of the input image and the memory. Thereafter, the one or more blocks of the input image and the one or more disintegrated kernels are fetched from the memory, and an output image is obtained based on convolution of each of the one or more disintegrated kernels and the one or more blocks of the input image.
Abstract:
A system and a method for information acquisition of Wireless Sensor Network (WSN) data as a cloud based service are provided. An apparatus in the system including a WSN, a service cloud, and a device, includes a virtual sensor configured to receive data from a physical sensor in the WSN. The apparatus further includes a virtual sensor controller configured to receive a request for the data from the service cloud or the device, and spawn a virtual machine (VM) based on the request. The apparatus further includes the VM configured to transmit the data to the service cloud or the device.
Abstract:
A method for computing an inner product on a binary data, a ternary data, a non-binary data, and a non-ternary data using an electronic device. The method includes calculating the inner product on a ternary data, designing a fused bitwise data path to support the inner product calculation on the binary data and the ternary data, designing a FPL data path to calculate an inner product between one of the non-binary data and the non-ternary data and one of the binary data and the ternary data, and distributing the inner product calculation for the binary data and the ternary data and the inner product between one of the non-binary data and the non-ternary data and one of the binary data and the ternary data in the fused bitwise data path and the FPL data path.
Abstract:
Provide are a methods and devices for processing graphics data in a graphics processing unit (GPU). The method of processing graphics data includes receiving, at a processor, a difference of Gaussian (DOG) layer of an image, detecting, from the received DOG layer, a candidate DOG layer of the image as an intermediate layer, detecting at least one extreme point by comparing values of the candidate DOG layer with values of a previous DOG layer and a next DOG layer, and storing the at least one extreme point in a buffer.
Abstract:
Provided is a method and system with deep learning model generation. The method includes identifying a plurality of connections in a neural network that is pre-associated with a deep learning model, generating a plurality of pruned neural networks by pruning different sets of one or more of the plurality of connections to respectively generate each of the plurality of pruned neural networks, generating a plurality of intermediate deep learning models by generating a respective intermediate deep learning model corresponding to each of the plurality of pruned neural networks, and selecting one of the plurality of intermediate deep learning models, having a determined greatest accuracy among the plurality of intermediate deep learning models, to be an optimized deep learning model.
Abstract:
An apparatus includes a global memory and a systolic array. The global memory is configured to store and provide an input feature map (IFM) vector stream from an IFM tensor and a kernel vector stream from a kernel tensor. The systolic array is configured to receive the IFM vector stream and the kernel vector stream from the global memory. The systolic array is on-chip together with the global memory. The systolic array includes a plurality of processing elements (PEs) each having a plurality of vector units, each of the plurality of vector units being configured to perform a dot-product operation on at least one IFM vector of the IFM vector stream and at least one kernel vector of the kernel vector stream per unit clock cycle to generate a plurality of output feature maps (OFMs).
Abstract:
A method and an apparatus for processing layers in a neural network fetch Input Feature Map (IFM) tiles of an IFM tensor and kernel tiles of a kernel tensor, perform a convolutional operation on the IFM tiles and the kernel tiles by exploiting IFM sparsity and kernel sparsity, and generate a plurality of OFM tiles corresponding to the IFM tiles.
Abstract:
A method of performing convolution in a neural network with variable dilation rate is provided. The method includes receiving a size of a first kernel and a dilation rate, determining at least one of size of one or more disintegrated kernels based on the size of the first kernel, a baseline architecture of a memory and the dilation rate, determining an address of one or more blocks of an input image based on the dilation rate, and one or more parameters associated with a size of the input image and the memory. Thereafter, the one or more blocks of the input image and the one or more disintegrated kernels are fetched from the memory, and an output image is obtained based on convolution of each of the one or more disintegrated kernels and the one or more blocks of the input image.
Abstract:
A method and an apparatus for processing layers in a neural network fetch Input Feature Map (IFM) tiles of an IFM tensor and kernel tiles of a kernel tensor, perform a convolutional operation on the IFM tiles and the kernel tiles by exploiting IFM sparsity and kernel sparsity, and generate a plurality of OFM tiles corresponding to the IFM tiles.
Abstract:
A method and apparatus to construct a bounding volume hierarchy (BVH) tree includes: generating 2-dimensional (2D) tiles including primitives; converting the 2D tiles into 3-dimensional (3D) tiles; and constructing the BVH tree based on the 3D tiles.