Abstract:
Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a parallel sorting operation on input data. The neural engine circuit includes operation circuits and an accumulator circuit coupled to the outputs of the operation circuits. Each of the operation circuits operates in parallel and is configured to compare a field of a first record of a first set of records and a corresponding field of a second record of a second set of records to generate a comparison result on values in the field and the corresponding field. The accumulator circuit includes a record store storing records that are involved in the parallel sorting operation and a sideband register that stores the comparison results generated by the operation circuits.
Abstract:
Embodiments relate to a neural processor circuit including a plurality of neural engine circuits, a data buffer, and a kernel fetcher circuit. At least one of the neural engine circuits is configured to receive matrix elements of a matrix as at least the portion of the input data from the data buffer over multiple processing cycles. The at least one neural engine circuit further receives vector elements of a vector from the kernel fetcher circuit, wherein each of the vector elements is extracted as a corresponding kernel to the at least one neural engine circuit in each of the processing cycles. The at least one neural engine circuit performs multiplication between the matrix and the vector as a convolution operation to produce at least one output channel of the output data.
Abstract:
Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.
Abstract:
Embodiments relate to a neural processor circuit including neural engines, a buffer, and a kernel access circuit. The neural engines perform convolution operations on input data and kernel data to generate output data. The buffer is between the neural engines and a memory external to the neural processor circuit. The buffer stores input data for sending to the neural engines and output data received from the neural engines. The kernel access circuit receives one or more kernels from the memory external to the neural processor circuit. The neural processor circuit operates in one of multiple modes, at least one of which divides a convolution operation into multiple independent convolution operations for execution by the neural engines.
Abstract:
Embodiments relate to a neural processor circuit that includes multiple neural engine circuits, a data buffer, and a kernel fetcher circuit. At least one of the neural engine circuits receives multiple sub-channels of a portion of input data from the data buffer. Neural engine circuit further receives a kernel of the one or more kernels from the kernel fetcher circuit, wherein the kernel was decomposed into a corresponding sub-kernel for each sub-channel of the portion of the input data. Neural engine circuit performs a convolution operation on each sub-channel of the portion of the input data and the corresponding sub-kernel. Neural engine circuit accumulates corresponding outputs of each sub-channel portion of the convolution operation to generate a single channel of the output data.
Abstract:
An input rescale module for an image signal processor (ISP) that downscales sensor data in the horizontal and vertical dimensions. The module may demosaic the sensor data to generate RGB data. Horizontal filtering may be applied to horizontally downsize the RGB data. The RGB data is converted to YCC, chroma 4:4:4. The chroma 4:4:4 is then horizontally filtered to generate chroma 4:2:2. Dropping chrominance data by going to 4:2:2 may reduce hardware area cost and power usage in the vertical scaler. Vertical filtering may be applied separately to luma and chroma to vertically downsize the YCC data. Chroma may be filtered with stronger filters than luma. The chroma 4:2:2 data may then be horizontally interpolated to generate chroma 4:4:4 data. The YCC data is converted back to RGB, and the RGB data is remosaiced to generate downsampled sensor format data.
Abstract:
The present disclosure generally relates to systems and methods for image data processing. In certain embodiments, an image processing pipeline may detect and correct a defective pixel of image data acquired using an image sensor. The image processing pipeline may receive an input pixel of the image data acquired using the image sensor. The image processing pipeline may then identify a set of neighboring pixels having the same color component as the input pixel and remove two neighboring pixels from the set of neighboring pixels thereby generating a modified set of neighboring pixels. Here, the two neighboring pixels correspond to a maximum pixel value and a minimum pixel value of the set of neighboring pixels. The image processing pipeline may then determine a gradient for each neighboring pixel in the modified set of neighboring pixels and determine whether the input pixel includes a dynamic defect or a speckle based at least in part on the gradient for each neighboring pixel in the modified set of neighboring pixels.
Abstract:
Embodiments relate to a neural processor that include a plurality of neural engine circuits and one or more planar engine circuits. The plurality of neural engine circuits can perform convolution operations of input data of the neural engine circuits with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. The planar engine circuit generates an output from input data that corresponds to output of the neural engine circuits or a version of input data of the neural processor. The planar engine circuit can be configured to multiple modes. In a pooling mode, the planar engine circuit reduces a spatial size of a version of the input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.
Abstract:
Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.
Abstract:
Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.